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ABSTRACT 


The “semantic integrity” of a data base is said to be violated when the data base 
ceases to represent a legitimate configuration of yest a aie environment it is intended 
to model. In the context of the relational data it is le to identify multiple 
levels of semantic integrity information: (1) the description of the domains of the data base, 
as abstract sets of atomic data values (domain definition), (2) the specification of the 
fundamental structure of the relations of the data base (relation structure specification), (3) 
the definition of the abstract operations which are meaningful in terms of the application 
environment (structured operations), and (4) the expression of additional semantic 
information not contained in the structure of the relations nor in the identities of their 
underlying domains (relation constraints). 

A high level, nonprocedural domain definition language facilitates the description of 
domains. Such a language allows the specification of the properties of the values 
constituting a domain, and the action that is to occur if an attempt is made to update a 
column entry such that it does not belong to the underlying domain of that column. The 
Specification of relation structure and structured operations can also be accomplished by 
means of high level rly Bynes 

A relation constraint has three components: (i) the assertion (a predicate on the state 
of the data base or on transitions between data base states), (2) the validity requirement (the 
occasion(s) at which the assertion must hold), and (8) the violation-action (the action that is 
to occur if the assertion does not hold at a time when it should). Relation constraint 
specification can be related to an expression framework (classification scheme) which is 
-. useful for the construction of a relation constraint language and specification methodology. 
- Assertions are more than expressions of some relationships among different values in a 
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L INTRODUCTION 
Rather than just a collection of values, a data base should be a model of some 


application environment. When a data base ceases to represent a valid configuration of 


that application environment, the semantic integrity of the data base is violated. The 


purpose of this thesis is to examine the problem of describing and preserving the semantic 
integrity of a data base in the context of a generalized data base system. The general goal 
is to provide a first approximation to a “theory” of semantic integrity (particularly in the 
context of the relational data model), and to provide a basis for a semantic integrity 
Specification methodology. This includes an overview of the relevant issues as well as a 
description of a particular approach to the problem, with emphasis on the high level, 
nonprocedural expression of semantic integrity requirements. 

Data base systems (data base management systems) are intended to assume the tasks 
of facilitating data storage, manipulation, and retrieval. The data base system should also 
be responsible for maintaining the correctness of the data in a data base, as well as 
providing users with appropriate abstract views of the data. This is particularly important 
for large data bases, as ad hoc and “hand” checking is impractical. 

By way of background, it might be useful to place the notion of semantic integrity in 
perspective, and to better define the meaning of the term as used in this thesis. There are 
a number of ways in which the soundness of data in a data base may be compromised: 

1, The reliability of data may be compromised by errors due to hardware failure, as 

well as those due to failure of the operating system and data base. system software. 

Hardware reliability (in the context of data base systems) has been considered 

elsewhere [Fossum 1974, Wilkes 1972]. Software reliability is a very prominent 


research concern at present, as exemplified by the work of those concerned with 
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establishing the correctness of programs. In the area of data base systems, 
Hawryszkiewycz and Dennis (Hawryszkiewycz 1972, Hawryszkiewycz 1973] have 
developed 2 ‘format semantic model ‘of 2 relational data‘bate system, defined data 
base primitive operations in: terms of this model, and ptoven the correctness of the 
operation definttidns:tabstract programs). Weber [Weber 1978) has further developed 
this approach. 
2.. The concurrent. consistency ‘of data-may be vidlated due to the effects of 
improperly controttal necesees te’ shared data by ruiltiptie' concurrent users (processes). 
It is desirable to provide’ each ‘user’ with a: consisttitt view of a data base, shielding 
"this user from. interfering’ effects due to the ‘activities of otter Users, while at the 
“same time: retaining a Maximam amount of legttiniete concirrent activity. Eswaran, 
Gray, Lorie, and Traiger Eswaran’ 1974} have ‘described a high level scheme for 
concurrent. consistency control in a relational data basé system. Hawryszkiewycz and 
Dennis [Hawryszkiewyez 1972, Hawryszkiewycz 1978) developed’a lower level model of 
concurrent consistency based’ on a formal semantic model ‘of a rélational data base 
8. Data security may be compromised by a fatture to properly (administratively) 
restrict thucmatiner in which a given user itty kocéssiihd mmatipulate a data base. A 
good -deat of. pioneering effort ‘in the-aren ‘of security and protection has been 
accomplished:in the contekt of operating “systemé’'Some Uf this work has been — 
extended to dath base sysems; e.g: the werk of Chantberliti, Gray, and Traiger 
{Chamberlin 1978], and Stonebraker and Wotig [Stoliebraker 19740) 
- 4 The semantic integrity-of data is violated when thie datid base ceases to represent a 
legal configuration of the application environment ‘it fs ‘intetidéd to model. ‘Semantic 
integrity errors may be. introduced ‘by: user ‘errer, lack ‘of ‘Uhderstanding, malice, etc. 
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inadvertent, impreper, ot malicious: updatd® [Seonditaker 19742). Infact, Wardware: 
cause the semantic integrity specifications of a data base te'Be-Violated. ‘Fer exemple, 

some user may, becaites of a failure of the dite ‘base'seurtty inechanism; make an 
unauthorized change, such as raising his*@wn salary From’ $20,000 to: $90,000; this 

- unauthorized change can then cause’ semantic ineageity constraint'to be violated,” 
This theels deals specifically with the fourth’ aspect of- tthe soundivess ofa: datiBaie; 
namely semantic integrity. In what follows, we assume that hardware’ atid software - 
reliability are guaranteed (ag. by the operating sytteh) We wiee assutne: thet concurrent 
consistency is assured; it is sufficient to assume, without loss of generality, that's single User 
is interacting with the system at any given time. Securiey:ingues are:not Surher censidered | 
in this. thesis 5-002 fcc ta wt site tine Ma cull elleamenesncs. 


LI. Semantic Integrity 
- A:data. base is meant tn serve.as-a model of some limited: universe; iat any given time, — 
the. values ‘in the date .base:represent.a partipular configuration: of that application: 
environment. Every such: world has its own intermaiogics 2 sev. of rules specitying what . 
constitutes a legitimate:and plausible configuration of. thatepuivenrsent (Fiorentin: 1974}. ‘It. 
should be the function of. the:data base system to:inswe that:these rules ‘are not violated 
and. therefore;that the dass: base.ds hot ina: semantically dnosnslatent.stlte. 0 
A basic premise we wilt adept is that: as noted by Minsky (Mijmsky:197¢ab “the 
fundamental: prqperty:of a date: bese is that it ha an-intsinsicimesndng which is invariant 
of its interaction with users"... The semantic integriey' specifientions fora: data :base capture _ 
this. intrinsic meaning: The date -base:aystem: should: factiqnte the precise expression: iof - 
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these integrity specifications. We assume that. some persan (or committee of persons), | 
known as the data base administrator, is respansible for stating the semantic. integrity 
specifications for the data.base. 

It is possible and indead desirable for the-data base system to support multiple 
abstract logical views of a data base. These. views gust however be constructed from and 


consistent with the.semantic integrity specifications (Le, the data base administrator's view — 


of the data base). Even providing a view of. the data base which. cansists of a subset of 
that data. base. is difficult, becanse.of the “conpections” between the subset and other 
elements of the data. base. 

A. vari of causes ay en compromie of the vane ingry of data | 
base, including: 

l. inaccurate-data recording or entry, 

2. inadvertent alteration of data during some transmission or transcription process, 

$. deliberate falsification of data, 

4. loss, omission, or delay of data. 

The ramifications of permitting incorrect data to permeate a-data base may- indeed be far 
reaching. Crucial decisions: may:be wrongly do fluenced, user-eenfidence: in the system 
destroyed, and the reliability and performance: of: the system degraded (including 
application. programs and packages:as well as the-data base system itself), 

It is generally recognized. that-the problem of bad-data:in data-bases is a serious one. 
Unfortunately, the state of: the art in etror.checking:in-datm:-buse: systems is quite dismal. 
Most semantic: integrity checking: is currently. accomplished: by means of application 
programs, data checking: mechanisms are embedded in these application programs. Special 
purpose data base “audit” routines are also sometimes weed to.check date integeity.. Existing 
commercial data base systems perform limited types of integrity checking, if any. This. 


ey Eten Te Bee a es Dae ese Behe Si ee! peecte 
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checking is. Nearly always limited ta simple data.format checks. Im sny.case, semantic 
integrity inf Formation and checking is usually. unstructared,.and: isembodded: im application 
programs in an ed hoc manner (Gosden 1974; furthermers, no-diatipline is impesnd an: the 
semantic integrity. aperification process: This-tack.of strucurtand disiplineshas the 
following consequences; 
_ 1 The mechanism by. cs et dw at 
2. Semantic integrity specifications are not readily modifiaiale.c-<- e200 
_ 8. The abstraction. detined by ee ee ce iritended 
to os the set of rules in: the spalleation: sntromraont, ds difficakt to 
_ 4. Inconsistencies and ‘anus can, be--present. in the. semantic integrity 
specifications, which may be difficult ta, locate ap PRI: 
5. It is difficuk to. make the semantic integrity checking. process eblicient,-etther iy 
means of manual or automatic optimization. bce RARE ED, IGE Bae BAT a 
12. The Data Model . vee ae are 
The data model upon. which, adap een a based: is vtetjeed hee to.consiss of 
the type(s) of data structures used..ta represent information in.the dala base, alnag:with.the 
set of primitive operations, which can be used; to. manipulase Menescamructures.; JFhie nature 
of the data model ungerlying.a data. base system hae.a. very bignificant: ceffect:on the 
manner in which one describes the semantic integysy: of;@: datesbase in that ayatem. As 
described below, some semantic integrity information is often in fact embedded in the 
structures used in.the data mode} Date 1975.Mammneps dQ Ger ty 
There have been three: principal data models, proposed: fer géieralized data. base 
systems [Date 1975] 


Semantic Integrity Specification 12 


"|. For historical and other reasons, the hiierarchicat approach is a very popular one. 
Examples of hierarchical dati’ base systerss and date sut jes Ganiguages for 
defining and manipulating data buses) include IMS TEM HQL fFehder 19741, Data 
Language [Marilt 1078) and Gystem 2000 DMRI 1972 Ih the hierarchic approach, 
some semantic integrity information is expressed in the form ‘of'che-to-many 
relationships (trees). Thus, one-to-many’ ee éXpressed By appropriately 
constructing the data base hierarctty. 
2. The:network approach is typified by the: Citeiyt DEBTS ‘proposal [Codasy! 197Ia) 
and the work of Bachman {Bachman 1973}. An‘example of ‘network data base 
system is Adabas [Software AG 1974], In the network data model, some semantic 
integrity information is exjpréwsed ‘Via many-to-many: ‘felationships; this is done el 
appropriately constructing the network: structures of the tata base. 
$. The retational approach was introduced by Could [Codd 1970] [Codd 197¢4a). 
Examples of relational data base systen ard’ tata’ diibhinguages include ALPHA 
[Codd 197Ia), INGRES [McDonald 1974a, McDonald 1974b, Held 1975b], MACAIMS 
[Goldstein 1970], Query by Example [Zloof 1974, Zloof 197, Zloof’ 1975b], RDMS 
" [Steuert 19741, REISS [McLeod 1975}, SEQUEL TRoyce 19784; Chamberlin i974b, 
Chamberlin 1975), aind SQUARE [Boyce 197%, Boyce 1975] In ‘the relational data 
_ model, functional dependencies are normally included in the'specification of the basic 
structure ‘of relations. ‘However, as discusted’ i’ seetion 13, these functional 
_ dependencies may be easily separated from the basic’structure of the relations of the 


% Bop 


Several (higher level) semantic date models have been recently proposed [Chen 
1975, Schmidt 19%, Senko-10%, Smith 1976; Tsichritzis 1979). These higher level 


models attempt to incorporate more semantic integrity information in the basic 
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structure of a data base. Structures in these data models are intended to represent 
~~ objects, attributes of objects, and relationships Bétweerr’ objjects (in the application 
_ ‘environment). Semantic opefattons on these structures pccou! mivinie —_ in 
the application environment. — oe 

It is not the purpose here to analyze these data modéts in detail, altiough many 
of the ideas developed herein are-quite closely Pétated to'work on semantic data 
"models, Rather, and for réasbiié tobe explained’ tatét, rie relational data model will 
be used herein, as a bails for the discuidion’ of dati"bite semantic integrity. 
Although the ideas discassed”in this thesis are’ appticibii'to ‘dara Baise systems in 

general, the discussion ‘Is coliched in terms of the relational fiodel of’data. 


1S. The Relational Data Model | 

The retationat datd’ model “appears to be the simplest data Hriécure vonsistent 
" with the semantics of information and which pioviltes a’ riattinamt ‘degree of data 
independence” [Boyce 181981’ ‘As’ concisely ‘stated: by ‘Codd’ [Codd 9Mak “In the 
relational approach there exists an iimerface at which the toeillity of formatred data in 
a data base can be viewed asx collection of nontiieratchit‘rétations of assorted 
degrees on a given collection of simple domains {domaitis’ Whose enients are: not 
decornposable as fir as the data base management syitem is GUnderned).” 

For the purposes of this thesis, a (rehictonat} database is défined'to be a 
collection of normalized relations (relations in first normal Péti fCodd 1970), and a 
collection of domafis. (The retations ‘present’ ti thie’ tate’ base ‘are spétifically called 
base’ felations.) A doeiain is ari abstract set off ator’ data Vales (db jects)’ Dor 
are defined independently of relations. “A fornivalized relation’ may Be viewed as a 
table, wherein each row of the table corresponds to’a tuple Of the rélation; and the 


Bet 2H Ee a: 
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entries in a column belong to the set of values constituting the underlying domain of 
that column. (An. entry is the value in some particular column for a given row of a 
relation.) The domain underlying a column consists of precisely those objects which 
can appear as entries in that column; any value in.the underlying domain of a 
column can appear in that column, and every value in the underlying domain is a 
plausible entry in that column. Note that domain and relation.names are unique with 
respect to a data base,.and that a. domain and a relation cannot have the same name. 

Consider, for example, a data base which cantains information about some 
company. Assume that a relation called EMP contains data on the employees of the 
company. EMP is shown in figure 1-1, described by its table representation. The 
rows of the table correspond to tuples. of the relation (records), and the columns 
correspond to instances of particular. domains of the data base. (Loosely speaking, a 
relation corresponds to a “flat” file, a tuple to a record, and a cadumn to a data field.) 

Each data base relation is created by naming the-relation. and each constituent 
column, and specifying the name of the underlying demain: of each column. More 
than. one column in a relation may have the same underlying domain. Column 
names are unique within a relation. Specifying the name af the underlying domain 
of each column defines the set-of values from which.entries in that column may be 
selected; that is, the set of entries in a column is always a subset of the underlying 
domain of that column. — 

Figure 1-2 contains a.description of an example data base... The name of each 
domain and relation of the data base is listed therein Gin upper aie characters). For 
each relation, the name of each of its constituent. columns is spetified (by one upper 
case character followed by lower case characters), as is the underlying domain of each 


column. Relation EMP contains information on the employees of the company, 
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SALES records information on the supplies of tréms Yor the company, ORDERS 
records order information, and BUDGET contains ~ salary — for each 
\ cause ol Oia tabieany.” : Vtennieder 2s Uae ae 

Figure 1-8 contains a list of some example primitive operations which may be 
used to interact with a relational data base is’ ‘hsumed tat in adltion to these 
"operations, a high level, nonprocedural query language is provided (eg., SEQUEL 
(Chamberlin 19740], QUEL fsa 19456) oF Qyiery by Eiample [Zioat" S70). | 

The advantages of the relitional dath modal have ‘bec ptevioy elucidated 
[Codd 197e, Codd 1978, Bate 19741, and‘ wit! not Be fepeated here in’ detail. For our 
purposés, the following attributes of the relational Ci OF dl Sener gta: 
L Access paths are not apparent in the logical view of Gack sa 
2 The data model is conducive to (lative nanprocadradih telecon, query, ‘and. 
mendadakesae Oe N16 
. It is possible to cleanly isolate the different levels of sin wig the 
relational data model, as disciissed in chapter’2Fot example, in’the hierarchical and 
“network ditt’ middels, certain types of integrity condtfaints art détibérately built’ into 
the data structure itself (eg., the owner-coupled set construct in the network inibdel). 
The data“base adifiitiinrator is thus faced with probleni:6t separating the semantic 


integrity redittrements fort the complexities’ of the dau beructiite. Howevér, in the 


relational data model, “thie data base ‘adiiniieratéy fias: onty ‘ie’ type of structure to 

consider, and a very simple coordliine’ jst (idéntifitadtdh’ é¢ ‘Yetations and columns 

‘by name and rows by content) by which he wy inbaie ks individual itém or 
portion‘ eat serucire” Date ive) es 
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2. SEMANTIC INTEGRITY 


In the context of the relational data model, it is possible to identify four principal 
levels of semantic integrity: 

1. Domain definition is the description of abstract sets of atomic data values, which 
are to be used to specify the set of values from which entries in columns of relations 
can be selected. This can be accomplished by means of a high level domain 
definition language [McLeod 1976a, McLeod 1976b}. Far example, the domain 
SALARY may be defined as consisting of positive integers. less than 100,000. 

2. Relation structure specification is the description of the fundamental structure of 


the base relations. This includes naming each constituent column of a relation, and 
stating the underlying domain of that column. | 


8. Structured operations are abstract operations, which are meaningt ul in terms of the 


application environment. Structured operations describe data base transactions, and 
are used to capture the conceptual types of manipulations that are meaningful for a 
data base (such as, for the example data base of figure 1-2, an operation HIRE- 
EMPLOYEE). Ae 

4. The. relation constraints level is concerned with relationships among data base 
components. Relation constraints are used to define all additional semantic properties 
of and relationships between the relations of a data base. For example, primary key 
{Codd 1970} (and third normal form [Codd 197lb, Codd 197Ic)) specification is 
accomplished by appropriate relation constraints, However, relation constraints go 
far beyond merely supporting functional dependencies; they provide the capability to 
define a very rich variety of types of data properties. For example, relation 


constraints may disallow inconsistencies between column entries of a single tuple or 
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between a tuple and other tuples in the. saree or other. rétation(s):: Faey may: alec. ~ 
preclude some global patterns in some set of. tuples n'a selasions or thie data base asa. 
whole, or way dimion: ceemin types of missing .datw:(such as — asa read 
- Walues, etc.) ? 2 
| Before further. describing: the approach to eemantic integrity which fs taken inthis © 
thesis, ‘we: briefly examine other work:that has beencdenein the aves ‘of wae 
in data. base iciatiad 


2.1. Background 
In. general, there are two. major:approaches to the specirteation ‘of: the: semantic 
integrity of 2 date base: WR tet. ; : Bs Peg Yee Asad SPAT be 


i In ‘sie ees nas aeeasa res aio that py which. wa re 
permissible (valid -stutet). The data-bise:pystem 4s responsible: fortnéuring that: ttie 
date base isalways ina valid: state. (As discussed Wear teen’ chapter, it mmy-be ° 
“necessary to. allow the date ‘base to temporarily: piss ttaGigh Oné of more invalid 
| 2 In a. state-transition appreach, the set of tegat dati bast operations is specified. — 
~ Depending on the tape base.state, only certain: operathins (ya Operations) are. - 
allowed to be performed an‘that state. : These ciacmag ala cr Sa preserves: 
. the integrity: of the-data: base. . 
. . Aumate snapshot approach to .desctibing. the semantic idibgyity. spepjfitations fora: 
data base involves the expression of logical coriatradats,. whickctan be. viewed as predicates. - 
on the state of the data base: These constraints: limit thestatenr of =: databeseto those'that: 
conform. to some: expressed. limitations: . Several; suthers: Boyce! 197Sa, ‘Eswaran 1975, 
Stonebraker 1974c, Stonebraker 1975¢,:Zioof 197Sb}:-hawe:stisquased: semantic integrity 
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assertions in the context of the relational data model. Graves [Graves 1975} has also 
considered the problem of semantic integrity. 

More specificatly, Boyce and Chamberlin [Boyce 197$a] introduced the use of 
SEQUEL predicates for expressing integrity assertions. For an operation which makes a 
data base change to be allowed, the predicates must hoid on the data base state which 
results as a consequence of the execution of that-operation.: Eswaran and Chamberlin 
(Eswaran 1975) have discussed the functional requirements of a semantic integrity subsystem 
and have examined semantic integrity in the context of SEQUEL and System R 
[Chamberlin 1975, Eswaran 1975). Stonebraker and Wong have considered. semantic . 
integrity in terms.of the INGRES system and the language QUEL [Stonebraker 1974c], and 
introduced the concept of query modification as a tool for the implementation of a semantic 
integrity subsystem [Stonebraker 1975c]. Consider the following example of query 
modification: a data base operation is attempted which states “increase the salary of each 
employee in the sales department by 10%"; assuming the existence of an integrity assertion 
which states that “each employee salary is. less than $90,000", query modification would 
transform the operation into one which specifies “increase the salary of each employee in 
the sales department by 10%, if that increase resulta in his salary. being less than $30,000" 
Zloof has studied the problem of semantic integrity with respect to the expression of 
semantic integrity specifications in. Query by Example (Zloof. 1975b]. | 

In these approaches, facilities are provided to allow the-user to state preditates 
(expressed in SEQUEL, QUEL, or Query by. Example) which ure to hoki on the data base. 
Assertions must be satisfied by the result of a. data. base change: for that change to be 
allowed. Several significant problems exist with these approaches: | 

1. They do not deal with the entire problem of semanticiintegrity in a relational data 


base, but rather focus primarily on relation constraints. 
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2. They are inadequately flexible with regard to when a actions are ta k et checked. 


8. The types of actions possible upon detection of semantis inary violations are 
limited. 


4. No structure is placed on the semantic integrity specifications; assertions are 
arbitrary rome the uate ofthe database or on tranaiion from one data bas 
state to another, | . . tape ae’ 

A state transition approach to semantic ng, sppctiatio, Sonsists of describing 
the set of legal operations which may be performed on a data base. In this apprgach, the 
user is confined to interacting with the gata. base by Meany, f.8. limited set of, opsrations. 
Semantic integrity information is thus procedurally. embedded in the. operations, This 
approach has been suggested by. Minsky (Minsky 1974, Minsky 1974), in the. context. of 
data base systems. Related work in the area of the definition of abstract data, types (eg. 
the work of Liskov and Zilles [Liskov 1974) has much in common with this operational 
approach. s ee a : ‘ 

Some of the mast significant, problems wi with the a state ania  papmch on are: 

1. Semantic integrity information is embedded in. precedures.in.an unstructured. 
manner, and is consequently hard to modify, and. potentially redundant, inconsistent, 
and incomplete. . | | ete adoeiciacn ved Nii 8 
2. The conceptual semantic model of a dee ae is age, to abet £ fa the 
procedurally ¢ embedded semantic integrity i information. | 
$. It is difficult to verify the correctness of the semantic tmingrty | information, as si. ds 
scattered through the operons stg tan ae = 
4. It is not always possible to precisely characerie 1 the: . of. operations. which are 
meaningful fora data base at the time the Gata, base. is sreated, Date is often kept in 
a data base before uses for it are discovered, or at least before all of its potential uses 
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are discovered; nevertheless, it is often possible to describe the semantic integrity of 
this data by means of properties it must satisfy (eg. assertions which must hold on 
the data). 

5. Different data base “views” (external schemas) may include very different sets of 
semantically meaningful operations, while still couched in terms of a single data base 
schema (conceptual schema). It is difficult to insure the consistency and completeness 
of the semantic integrity checking which is performed by the operations in different 
views. es | 

6. Some data base operattons are not meaningful in terms of the semantic integrity of 
a data base, but are nonetheless required in practice (eg, an operation to change a 
person's date of ‘birth, the value of which was orginally incorrectly entered into the 
system). | 


2.2. An Approach to Semantic Integrity Specification 


The major goal of this thesis is to provide a first approximation to a “theory” of 


semantic integrity, particularly in the context of the relational data model. In so doing, it is 
hoped that a basis for a semantic integrity specification methodology will bé developed. 


This methodology should assist in the formulation of the semantic integrity rules of a given 


application environment, and direct the selection of those rules which will constitute the 


semantic integrity specifications of a data base feg. in the face of implementation cost 


tradeoffs). 


A semantic integrity sbaieen must be capable of performing: 


1. semantic integrity checking (error detection), 


2. semantic integrity violation localization (determining precisely which data values 


are in error), 


eo ge Pep ee Tg Me Te et 
z ; 
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3. semantic oveerty violation-action (reporting/response). 


The semantic integrity specification language(s) must provide the user ‘with the ability to 
state all information required to pelos these tasks. (This include, of course, a precise 
specification of the semantic integrity rules themselves) a es 

Actually, it is desirable not only to encapsulate din the data base semantic integrity 
specifications) knowledge about the semantic integrity of a date ‘ban, but also knowledge 


about how users will interact with the data base. The meaning of a data base includes the 


ais 
S32 treaty 


manner in which users interact with It; semantic Imegiy a and ser abstraction a are ‘closely 
related issues. _— : 

Some semantic integrity information is best expreted via a 2 sate snapshot approach, 
while other information is best expressed in terms of state ansitions. 1 The parproah 


described in this thesis includes both state snapshot and wate transition aspects. | 
: Basleally then, the approach to semantic integrity ‘taken here has several ma jor | 
ob jectives: | | . . 
L It should be possibte to express aerate integrity specifications: 
| a. ona high level, 
b, declaratively, rather than procedurally, | 
c. ina structured manner, 
d. abstractly, in a way relevant to the application environment. 
2. These specifications should be: 
a. easily modifiable, 
b. nonredundant, 
c. consistent, 
d. complete (as a model of the application environment), | 


3. Semantic integrity checking should be: 
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a. the responsibility of the system (but the system may sometimes need to ask 
for advice from the user), 
b. flexible, atlowing appropriate  senedietion of when checking is to be done 
(eg., after primitive data base change, after conceptual transaction, etc.), 
c. eccapabyy efficient in terms of the overai performance of iad = base 
system. 
4 Semantic integrity violation-action should be 
a. flexible, allowing an appropriate violation-action to be specified (eg. . | 
including error reporting, corrective action, etc), 
= b. sufficiently “localized” ‘0 as not to 0 generate time-consuming expensive, and 
potentially destructive “side effects”. 
The approach vo semantic integrity described in this thesis may in fact be viewed as 
a: generalized ‘approach to data base design and/or data definition. That is, we are 
attempting to provide a framework ‘by which the data ina data may be described. 
Additionally, the framework described ‘herein may prove ‘useful as a base language into 
| which specifications in terms of a higher level data model (such as § those described in 
[Chen 1975, Schmidt 1975, Senko 1975, Smith 1975, Taiehriats A ray be translated. 
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8. DOMAIN DEFINITION . 


The purpose of this chapter is to discuss domain definition, 0 one ¢ level of semantic 
integrity in the context of the relational data model Spicy, th the prec definition of 
domains, viewed as sets of atomic data values is considered, ‘This includes a review of the ; | 
functional requirements for ‘dealing with the problem of domain definition, a discussion | 
and evaluation of other work that has been cone in ihe vada and 1 = : description of a ri 
specific solution to the domain definition problem, 


It is important to note that a domain bs difterent from a ‘unary relation. Domains bare | | 


“2 23er2 att RE 3D 
abstract sets of atomic data values, and may in fact contain an infinite 0 number of clement 
i peo esy ail op seed gle aa 


A relation, by contrast, must contain a ‘finite rurnber of have Absractly, relations are 


trina. 


sub ject to change (eg, by the addition of new ‘tuple, but domains are changed only when 


the associated abstraction changes. To a crude first t approximation, 1 the set of values 


Wy shot 


constituting a domain n fixed at the time the data base is defined (“com compile time’), while 
‘ : veBir 3G B9Rhe TATA, ae 


the set of tuples ina relation | is normally changed during ue ee operation of the 


sak Re 


data base system Crun time"), 


See y 4 ee ods 


Domain semantic Anvegrity errors, bey errors which involve the presence of entries in 
£ PLR ol ib a 
some column of a relation which do not belong to the domain “underlying that column, 


occur een come to Juni a peeks to | handle them, Speci experience with a 


pebiadl e 


peek 


integrity errors McLeod ier | 
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3.1. Describing Sets of Atomic Data Values 
As discussed in chapter 2, several approaches to semantic integrity for relational data 
bases have been recenuy presented. As noted in that eeprre all of these approaches 
essentially deal with relation constraints, ie, facilities are provided that allow the user to 
state predicates (expressed in SEQUEL, clo or r Query by Example) which are to hold 
on the data. base. 
The requirements of domain definition are not a adequately sipported in these systems. 
They lack the capability to allow domains to be precisely defined as abstract sets of atomic 
data values. All of these systems allow the data. type of each column of a relation (not each 
domain of the data base) to be defined, but the possible types are limited and very 
representation-ortented, It: should be possible, for exaniple, to define domains like 
SOCIAL SECURITY_NUMBER and GEO _COORDINATE, rather than being limited to 
such domains as INTEGER and CHARACTER _STRING. It i desirable to be able to 
describe a conceptual class of data values. This abstract description is quite different from 
a mere specification of the: physical representation of the values ina domain; rather, the 
semantic properties Of the:domain are pronounced. The work of Liskov and Zilles [Liskov 
1974) concerning. abstract data types is related to this nation, in that classes “or abstract data 
objects (values) are being described. : 
Boyce and Chambertin [Boyce 1973a] have proposed attaching attributes to each 
column of a relation (column descriptors"). One of these attributes is the scope of a 
column, which specifies the set of permissible values for entries in that column, eg., salary 
is a positive integer less than 20000. Similarly, Zloof [Zloof 1975b] has indicated that 
provisions should be made for facilitating the specification of entry “formats” (“their type, 
size, etc,"). 


A detailed scheme is needed to facilitate the precise description of domains, and to 
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integrate the domain definitions with the structure of the:selational data base. Suchia 
scheme should (at least) satisfy the following criteria: 
1. facilitate the precise and detailed description of sets of. atomic data values, as. 
subsets of one of the natural demains: real number and character ‘string (these 
“natural” domains are the primitive domains: which are: used: 20. constract other 
domains), 
2. provide for the proper abstraction of defining domains independent of their use as 
underlying domains of columns.in one or mone relations, . 
‘3. force a domain definition to be a single module, sothat demain semantic integrity... 
information is localized, 
4. facilitate automatic:domain definition checking and flexible types of action which - 
are to occur upon:-detection of a:domain defintion: violation, | 
-S. support specifications that describe when.and.haw donain-values can be compared. : 
(e.g., when two values being compared are.feom the same domain, aad when the two: 
values are from different domains), and converted: (eg., ‘when it is:desired te.convert 
the value in one domain into and "equivatent” valueiin-another domain). - 


- 3.2. A Domain Definition Language 

_A high level, nonprocedural Janguage can be used to express:domain: definitions. In 
this language, each domain in a data base is :described-by a single domain definition 
(domain: definition module). The definition ofa domain is:“insestted’ (bound) at the time 
the domain is created. Domain creation may: ‘be viewed=4s sar aa ‘of the domain 
definition module. Note that a demain detinition. specifies: sec untbertying: set:@f atomic 
values. Domains are not dynamic as are unary relations; vather,. they. constitute fixed 
abstract sets of data values. The definition of a domain ‘may be modified, but this occurs 
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only when the abstraction has changed. 

As noted by Hammer and McLeod [Hammer 1975], three types of information are 
required by the semantic integrity subsystem to deal with domain definitions: 

L. a specification of the set of atomic data values constituting the domain, 

2. information describing when the domain definition is to be checked, | 

3. a specification of the action that is to occur if the domain definition is not 

satisfied. 
Since we shall assume that domain definitions are checked whenever an entry in some 
column of a relation is created or altered (e.g., by an operation which inserts or updates a 
row), the specification of when a domain definition is to be checked need not be explicit. 
Thus all that need be explicitly expressed in the statement of a domain definition is the 
precise description of the set of values comprising the domain, and the action that is to 
occur if an entry in some column of a relation is created or modified so that it does not 
belong to the underlying domain of that column. 

Each domain definition therefore consists of the following four components, 
represented as clauses in the domain definition language: 

1. Domain name 

2. Description 

The description clause allows the set of atomic data values constituting a domain to 

be specified. The set of values constituting a domain is defined as some subset of 

one of the two natural domains: real number and character string. Every domain is 

thus defined and represented as a subset of the real numbers or of the set of 

(varying length) character strings. This specification may be accomplished by: 

a. enumerating the domain values, 


b. decomposing the domain values by specifying the subunits of which they 
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are composed, 

c. placing restrictions on the set of values by stating predicates that describe a 

subset of one of the natural domains, | 
or a combination of the above. The special data value “null” (undefined) is present 
in each domain. This is to allow missing data to be represented in the data base. (It 
may sometimes be useful to distinguish an “unknown” value from a value which 
“does not make sense” [Florentin 1976), but this distinction is not made here.) 
3. Ordering 
The ordering clause is used to indicate how domain values are ordered with regard 
to comparisons with other values in the same domain. This information is important 
in identifying the semantic properties of a domain. One type of ordering 
specification is that the values in a domain inherit the (total) ordering of the natural 
domain of which the domain is a subset. Inherited ordering may also be by subunit 
(e.g., the primary ordering is by one subunit, the secondary ordering by another 
subunit, etc.). Inherited ordering is numeric for domains which are defined as 
subsets of the real numbers and lexicographic for domains which are defined as 
subsets of the character strings. Another type of ordering specification is that no 
ordering exists, in which case only equality comparisons are meaningful. An external 
procedure (i.e., a procedure in some programming language other than the domain 
definition language) can also be used to define the ordering specifications for a 
domain; this procedure is called whenever two values in the domain are to be 
compared. Such a procedure accepts two domain values (which are to be compared) 
and returns the value that is first in the ordering sequence. 


4. Violation-action 


The violation-action clause specifies the action that is to occur if an entry in some 
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column of a relation is created or changed in such a way that the entry does not 
belong to the underlying domain of that column. Types of violation-action include: 
a. the change may be refused and an error signaled, 
b. a particular value, either constant or calculated from the erroneous value by 
means of operations (such as ee concatenate, ete.) may ‘be substituted as 
the new value of the entry, 
c. a call may be made to an external procedure, the erroneous value being 
passed as the argument to the procedure, and the t procedure returning the new 
value of the entry. ° 
System-generated or user-specified messages may be optimally returned to the user or 
calling program. Note that in cases b and c, It may be necessary to recheck the 
domain definition after the corrected value of the entry has been determined. 
At this point it should be noted that the use of e external procedures for ordering and 
violation-action specification ‘should be minimized, insofar as possible. The capability for 


such use of external Procedures is provided f for r generality and completeness. 


3.2.1. Ps Details and Examples 

Figure $-1 contains domain definitions for some of the example data base domains. 
An indentation-oriented syntax is used in this figure. “Examples of values in each domain 
are listed (in parentheses) to the right of the corresponding domain definition. | 

Figure 3-2 contains a specification of the syntax of the domain definition language. 
In figure 3-2, syntactic classes are denoted by lower case strings, while keywords are in 
upper case; actually, the language should include both upper and lower case keywords. 
Specs parts are enclosed in “O", and alternatives are separated by aie | 


In figure $1, the description clause of the NAME domain definition specifies that it 
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consists of (character) strings, each of which is composed of a string followed by a “, ", 
followed by another string. In this description clause, data values are decomposed into 
subunits; the first and third are variable subunits, while the second is constant. Subunits 
may be labeled, so that they may be referenced elsewhere in the domain definition. As 
Stated above, external to a domain definition, the data values constituting a domain are 
either atomic numbers or atomic strings. The rule is, if a description clause of a domain 
contains only number subunits (variable or constant), then the values in that domain are 
numbers, otherwise they are strings. Number and string subunits may be mixed, and if so, 
number subunits are converted to string form to yield the string values constituting the 
domain. For example, domain MONEY is defined to consist of strings of the form 
"$25,000". Values in domain MONEY have two subunits, the first of which is the string 
constant "$", and the second of which is a positive number. Values in domain MONEY 
are thus represented as strings; the number subunit of any value in domain MONEY is 


viewed as a number (and can be manipulated as such, eg., by “+") when the subunit alone is 
considered, but it is viewed as its string “equivalent” with regard to the domain value as a 
whole (and can be manipulated by string operations). 

The description clause of the domain SEX indicates that it consists of two data 
values: “female” and “male” (in addition to the ever-present “null”). This is an example of 
description by enumeration. | 

For domain MONEY, the subunit labeled “value” must be greater than or equal to 
zero, as specified by the subunit where restriction. A subunit where restriction contains a 
predicate that is to be true for the subunit and involves only that subunit; that is, this 
predicate is a restriction on the set of numbers or strings which values for this subunit may 


have. It is thereby possible to express properties of number subunits involving comparators 


(such as “=" and ">") and number constants. It is also possible to state that a number is an 
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exponential (exponential notation) or an integer (as for domain DATE). For string 
subunits, a size (length) specification can ‘be made, the’ set of characters permissible ina 
string can be defined (as. fer domain ITEM), and a ee iia compartion 
(such as'"=" or *>") with-constants can be stated. - , 
A global where restriction ‘permits expression of properties involving multiple 
subunits, as well:as those on domain values viewed as a unit. “A global where restriction 
contains: a predicate that‘may involve a domain vale, subunit values, operations, and 
comparators. ‘String operations can be employed ‘to ‘generate substrings, calculate lengths, 
perform concatenations, ete. Number operations iriclude the ‘usual arithmetic operations 
of domain MONEY, the 
global where restriction states that domain values Wiewed : as strings) must either have two 
digits to the right of the decimal point or else’ have no ‘decimal point. Here, *right(o, + 
1)" evaluates to the right substring of the domain valle which is referenced by “os starting 


and “maximum” and “minimum”. For example, in the descrip on 


at'the character after the occurrence of “*(This ford of the “right” operation takes two 
arguments: a string whose right substring is to be caftulated, and another string whose 
index in the first string is calculated to determine at which character of the first string the 
right substring is t6 begin.) The operation “present® yields “true” if the first string 
specified contains an occurrence of each of the following’ strings, otherwise it yields "¢ alse”. 
The global where restriction of domain ITEM illustrates the specification of the number of 
times some contiguous group of subunits'can repeat. | 

A where restriction may also contain a call of an externat boolean procedure (as for 
domain ITEM). If this procedure call is in a global where restriction, the procedure is 
invoked with the domain value in question as its argument; ‘the procedure returns “true” if 
the value is present in the domain, otherwise it returns “false”. ‘If the procedure call is in a 


subunit where restriction, the procedure is invoked with the subunit value in question as its 


SOBRE ae bo 
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argument; it returns “true” if the subunit value is legal otherwise it returns “false”. | | 
Boolean combinations of the above types, of where restriction 1 are allowed in 1 both - 
subunit and global where restrictions, as are condjtjonals (as for domain DATE). In 
addition, an “or” may | be used to indicate that the domain contains values that come in more | 
than one form, ie., that the domain consists of the union of two of more sets Of, values, | 
each of which is defined separately. ; 
The second clause in a domain definition is the oneen: cause. This may specity 
that no ordering. exists on values in. the domain Chone’),, which means that only equality 
comparisons are allowed (as for’ domain SEX). An ordering. Specification of. “atomic” means. 
that values in the domain are ordered by the usual numeric or, lexicographic ordering, Ss 
viewing the domain values as atomic numbers or rings (as for domain QUAN). The : 
ordering clause may also contain an ordered Uist of labels (fubuait names), indicating that . 
domain values are ordered according to the values of the specified subunits. The usual 
numeric or lexicographic ordering on these subunits is used, and the subunits 3 are taken in 


kim? 


and DATE). Finally, an n external a proeadure c aan be used to ‘to mn the ae on a 
values in a domain. This procedure is passed the two values. being compared, and returns — 
the value that is first in the ordering sequence (as for domain. ITEM). : | 

The third clause in’ a domain definition As, the. Violation-action, Glause. AS discussed ; 


above, it may specify. that an error is to be. signaled, indicating that the data base change , 
Specified by a user is incorrect and should. be rejected. . A system generated. or user-specified 
message may be optionally returned to the user or calling program. This is also true for the 

other types of violation-action. if the vialation-action 4s specified as. "error", then an error 
is signaled and a system generated message is returned (as for domains NAME and DATE). 


Domain SEX has a violation-action clause that specifies error signaling with a user- 
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specified error message. If a system-generated message were desired the specific message 
could be replaced by “SYSTEM-GENERATED”. A aytem-generated message can be of 
the form “the definition of domain SEX is violated”, or can bear more information if the 
system is a bit smarter (eg, “the definition of domain ‘SEX is violated, it consists of only 
the two values ‘female’ and 'male”). The “substitute” violation-action allows a constant 
value to be substituted as the new value of the entry being created or changed (as for 
domain MONEY). A calculated value, obtained via string or number operations, can also 
be substituted (as for domain ITEM). In the specification of this calculation, "+" represents 
the value that is being checked:to ‘determine if it is in the domain: “The cakulated value is 
then checked to make sure that it is in fact a valid domain value; if Not, then an error is 
signaled (to avoid infinite recursion). “The definition of domain in QUAN of offers an example 


of an external procedure call violation-action, : 


3.3. Implementation Considerations 

The domain definition language processor r transtates, domain definitions into an 
internal form used in semantic integrity checking. The semiantic integrity subsystem has the 
responsibility of determining what checking is to be done whenever some data base change 
request is issued by a user. It must also assume the responsiblity of performing this 
necessary checking. Whenever a new entry is created in a cohimn (eg. by an insert row 
operation) or an existing entry in some row is changed (eg. ‘by an Update row operation), 
the system must make sure that this new ‘entry belongs to the ‘underlying domain of the 
column in which it occurs. The information in the description clause of the underlying 
domain of the column Ji used for this purpose. If the domain description is violated, the 
information in the violation-action clause is used. ‘The ordering information is used when 


comparing two values in the same domain, as disciisted in chapter 4. 
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A domain definition may be ed to obtain the isk ahidate snecessaty © to construct 
several internal relations, which are used by, the semantic fc integrity nang Ac to facilitate 
domain definition reine: 

The domain definition r relation cc on contains a siagta tuple for each domain of the data 
base; this relation has the following columns (with primary key 6 domain name} 


a. domain nan, ves 

b. description type, which is "ample if the domain has ane  nonlabeled subunit 
with no where restriction, otherwise comple’ 

c. slobal where restriction, - _ 

a. violation-action type, which is "error" “substipate”, or veal”, 


e. violation-action modifier, which for = wpe “sibstiute is the 

value (constant or calculated) to be ‘substituted, for “ell i os the name of the 

external procedure to be called, otherwise “null”, e : 

f. error/warning message, which | is ~~ a constant (er spetie message), 

“system-generated’, or “null”, 
g- ordering type ‘which is “atomic” “none”, ; babonit" P Wor subunit specified 
- ordering), or “call” (fr external procedure all ordering), — 

h. ordering procedure name, which is the name of the external | ordering 
procedure if the ordering type is “ar, otherwise “null | 


2. The subunit def efinition relation contains a ‘tuple for each subunit of iach domain; . 


this relation has the following columns (with primary hey domain name, subunit | 

index): | 

. a. domain name, | . 7 . | esa se, att ees 
b. subunit index, which is the ordinal number of the subunit in the domain 
definition, —— = “2 
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c. subunit type, which is either “constant” or “variable”, 
d. label, which for constant subunits is “null”, 


e. variable subunit class, which is “number”, "string", or "oneof”, and “null” for 
constant subunits, 
f. subunit where restriction, “null” if none exists, 
g. ordering index, which is the ordinal number of the subunit in the ordering 
clause, and “null” if this subunit is not referenced in the ordering clause. 
3. The oneof constant relation contains a tuple for each constant in a “oneof” 
description of domain values or domain subunit values (for each domain in the data 
base with such a “oneof™ description); this relation has the following columns (with 
all columns in the relation as primary key): 
a. domain name, 
b. subunit index, 
c. oneof constant, which is a constant in the “oneof” list for the subunit 
identified by the subunit index (for the domain specified by the domain name). 
Domain definitions may be utilized to automatically determine the appropriate 
physical storage type to be used to represent values in a domain. For strings, a fixed length 
character string representation can be used when possible, such as when domain values are 
enumerated (via "oneof"), or when an upper bound is placed on the length of string values 
in the domain. In other cases, varying length character strings can be used. For numbers, 
it may be necessary in many cases to make a compromise for efficiency. Integers (“number 
where integer”) may be represented by a fixed binary storage scheme (eg., single word 
binary), but it must be clear that this is only an approximation to the domain definition. A 
Similar situation exists for real numbers: a float binary representation may be used for 


storage. 
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3.4. Extensions 


hoa Med 
a ee 


Important issues to be considered in future research on domain definition include: 


1. It is pes to aaaiee the domain definition language a9 eee: previously defined 
“ly GRiy tka GS 
domains may be used as 5 subunit in the definition of a new domain. If this 


be ‘ee 


hierarchic approach is used, care must be taken by the ayte to retain domain 


definitions until they are no longer referenced in any other domain definition. 


Fiabe: DONE, TN ED ea Ey 
2. It may be useful to introduce domain operations In this app h, tions ise 
eo Sed POT Ges a 


defined for each domain, and ‘manipulation o of values in the domain is is + renriced to 
the specified operations. This approach i simular to the notion of abstract data types 
of Liskov and Zilles DLiskov 1974) It may be ¢ argued that the approach t taken in this 
paper is still too representation-oriented. 7 For example values 1 In the domain 
MONEY may be strings or numbers, but this is irrelevant with respect eg abstraction. 
The important properties of the values constituting a domain may be best 
characterized by specifying the operations that are defined on i the Values in the 
domain. Of course, in this case a domain will Bs y fonger be defined. asa “subset of 
one of the natural domains (string and real number), and the standardized set of 
domain operations (such as ">", “=”, "+", etc.) will probably no longer be appropriate. 
3. It may be advantageous, in some cases, to defer the checking of domain definitions, 
and not report violations at the time the data is actually entered into the system. For 
example, in the case where a data base is being "bulk loaded” or updates are being 
“batched”, it may be desirable to report all violations of domain definitions at a later 
time, say to an interactive user or as part of a summary report. 

4. The modifiability of domain definitions is a very important issue. It should be 
possible for the definition of a domain td be changed as the corresponding 


abstraction changes. If this is allowed, then it is necessary to verify that all entries in 
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columns having a given. underlying domain satisfy the new definition of that 
domain. | . | 

5. It is possible to call an external procedure to verity that a value in question belongs 
to a domain. An external procedure call may also be used in the ordering and | 
violation-action specifications. However, we have no guarantee that the external 
procedure is correct. Some reliability is nonetheless guaranteed by the fact that this 
external procedure must use the normal data base system interface. In addition, the 
domain definition is again checked after the external procedure has terminated. 

~ 6 The problem of implementing the domain definition scheme and evaluating its 
effectiveness and of f iciency has yet to be fully addressed. 

7. It may be useful to consider the automatic generation of domain definitions by 
| attempting. to generalize upon a few examples of domain values which are given bya 
user. This is, of course, a part of the general problem of the detailed specification of 
the user ‘interface which b supports the construction of ‘domain definitions. 
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4. RELATION STRUCTURE - 


Relation structure specification is the description of the fundamental steucture of the 
(base) relations of a data base. When a relation is created, at least.the following aust be. 
done: 

1. The relation must be given a name, which is unique with néspect to all.names of 

relations in the data base. . 

2. The number of eglumas in ic aeliicaliiss Uoapuanns 

$. Each column of the relation. must be assigned. a unique. name (ynique. with respect 

to the names of the columns of the relation. 0.20 

4. The name of the underlying, damain: of each. column sisi tia Npecitad: A. 

definition for each domain thus referenced must exist at the time the relation is 

created. | a ee 

It is possible to include other types of information as a part of the fundamental 
structure of a relation. For example, the primary key. (Coda 1970] of ‘the relation. may be. 
identified. However, at the level of abstraction at which our.discussion of semantic. 
integrity is focused, the identification of the primary key ma 
relation constraint (and expressed a such). Furthermore, there is.no. compelling reason for 
distinguishing the primary key from other candidate keys (Codd 1970]. It is most logical for. 
a primary key specification to be viewed as a relation constraint, as is the. case for other 
types of functional dependencies. 

Many higher level semantic models for data base. Aesign.and abstraction (data - 


.be viewed as a type of 


definition), eg.,. [Smith 1976), consider certain types of relation. constraints (such as 
functional dependencies) to. he special. Functional.dependencies are.ona important type of 
constraint, but there are other types which may be equally important {in.some application . 
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environment). We believe that it is essential to provide for a broad spectrum of relation 
constraint types, and to integrate the formulation of these constraints with the process of 
data base ee les and abstraction. In. sy amet to rélation constraints is 
further: discussed. 


41. Additional Column Information — | 
In addition to the column name and the name of its underlying domain, it is useful 
in practice to allow two additional ‘attributes to be dissticiited’ with ‘each column: 
lia narrative description of the column, for dottiftiefitation purposes, 
2. an indicator specifying whether “null {uhdéfinedy values may be present in the 
column (thus ‘allowing “null” vatdes to be aeléctivaly prehibited from columns). 


4.2. Comparability — 

Ttie ‘kinds of comparisons and‘ manipulations of ‘column entries that are allowed 
relates to the semantic integrity requirements of 2 database. “The term comparability is 
used herein to refer to the general problern of detetmiining when ahd how two or more — 
are two basic types of ‘comparisons: intradomain cbpatlsons “and interdomain 
comparison 

Intradomain comparisons are those in which two values from the same domain are — 


compared. In this case, the information in the ordering clause of the domain definition is 
sufficient to determine how the comparison is to be made. © | 

Interdomain’ compaftsoris are those in which two Valdes from different domains are 
compared. In this case, values are compared as atoinic string or numbers using a domain — 


conversion, as defined below. 
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4.2.1. Domain Conversions 

A data base has associated with it a set af ¢ domain conversions. Each domain 
conversion is specified by means of a domain conversion module,’ Ean: such genversion is 
a specification of how values in a given domain. are converted inte “equivalent” values in 
another domain, and vice versa. Explicit specification af domain conversions is necessary 
because values in different domains. helong to. different abstract sets, and. converting a 
value in one domain into an “equivalent” value in another: requires. knowledge of the 
precise nature of the abstract sets corresponding to the wo domains Involved. For example, 
both FEET and INCHES are riumbers, but they cannot be. meaning£ully added without the 
use of an appropriate conversion. | 

Domain conversions are defined independent of the domains Gand sentions) of a 
data base, in the sense that domain conversion modules haye ‘no access to the internal 
details of a domain definition; domain conversions thus map,siomic. values in.pne domain 
into atomic vajues in another. Domain conversion modules .can.he-dynamically. created, 
deleted, and modified, with the restrictions that: 

conversion is created: oe 

2. if either of the domains referenced in the domain canyersion is deleted, the. 

domain conyersion is deleted. ‘ . 

For the purposes of this thesis, it is assumed that ‘domain. conversion modules are 
written in some high level programming language. This language may be a. specialized 
one, similar to the domain definition language. For generality, it is permissible to.allow this 
language to invoke external procedures written.in a high level. general purpose 
programming language. 

For example, a conversion for domain. DQ@LLARS and 
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THOUSANDS_OF_DOLLARS can be defined as: 
domain conversion DOLLARS, THOUSANDS_OF_DOLLARS 
POCA bee ahaa BOuaRS 

Conversions may be unidirectional as well as bidirectional, and this is the reason for 
the - ‘seemingly redundant specification in the above example: ‘For more complex types of 
conversions, ‘external procedures may be used; for example, we may ‘have: 

domain conversion DATE, JULIAN_DATE — 

-DATE = pl ULIAN_DATE) 
JULIAN_DATE « p&DATE) 
where pl and p2 are external procedures. 

Structured operations may perform various types of domain comparability operations 
on entries in-a data base. The standardized set af such domain operations includes "=", 
wa", ">", pa", “a”, “en”, “e %, “e, “/" “oo, and string tind ‘User-defined operations. For 
example, some structured operation may check to’ dee ‘if, for some tuple in relation R, the 
entry in’ collimn A is targer than then entry in columnB. ‘(ie is ésumed tha Both columns 
A and B contain numbers) ae | | 

Whether or not values from different domiatns may be utilized together (compared or 
otherwise manipulated) depends upon the nature of the domains and the particular type of 
operation that is to be performed on thel values in those domains. Ih order to establish a 
first approximation to a set of comparability rules (for the’ standardized set of domain 
operations), three types of comparability are distinguished: | an 

1. equality-type, which is invoked when one of the following types of manipulations 

Rate . seta Gad clr, 
"a. values are compared for equality ("=") or inequality (“~="), 
| b. numbers are added ("+") or subtracted ("-"), 


C. sets of numbers are manipulated via set operations, such as “fnaximum” and 
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“minimum”, 
d. sets of values are manipulated by “union”, "intersection", or “difference”, , 

2. ordering-type, which Is invoked when values are compared via Ney “eet ">". or 

=, tha ; 

3. mixed-type, which is invoked when values are manipilated. via multiplication ("so"), . 
division ("/"), exponentiation ("se"), or any string operation. or user-defined: operation. 
Equality-type camparisons are always allowed if the two: cwaluas being compared (or. 

manipulated) are from the same domain, Le, i the values.are from the same column or 

from columns with the same underlying. domain. . ¥ the values are. Nok, from the same 
domain, ie, they are from distinc columns with different underlying domains, then ‘they. 
may, be compared. if and only if a domain conversion exists between: thase. domains. (All 
domain conversions moust.be explicitly defined.) The domain conversion. is used to convert 
the value in one of the domains into an “equivalent” value in the.other.domain,.and the. 
resulting values are then. compared. (Another type of .conversien could be supported, by. 
assigning units to each column, and defining units conversions. McLeod 1976b).) - 
Ordering-type comparisons are allowed if two yalues are from the same underlying 
domain and the ordaring of that domain is not "none". The orsering,infarmation in the 
domain definition is used. to determine how the values are ta be: compared. | Ordering-type 

" comparisons are also allowed if the two values are from different solumns, these columns 

have different underlying domains, and a domain conversion exists between those two 

underlying domains. In thjs case, the values are compared by using the domain conversion, 
as for equality-type comparisons, In any other case, ordering type comparisons are. nat 

allowed. | a ae | i . 
Mixed-type comparisons are always allowed. Values can always.be manipulated by a 

mixed-type operation (with no restrictions). Values that.are numbers may be multiplied, 
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divided, and exponentiated with no limitations, except of course for the requirement that 
the values ‘be numbers. Although nutbers may be-adided and subtracted only when they 
have the same “units”, multiplication, division, and exponentiation can be performed 
without any such restriction. It presumably makes sense to divide a value in domain FEET 
by a value in domain POUNDS, but ‘it is (normally) not sensible to add these two values. 
For mixed-type comparisons, vahies being manipdtated are treated as atomic and domain 
conversions are not used. Note that if user-defined domain operations are allowed, they 
may be placed in this category by default: More generatly, it nidy be best to allow the user 
to specify the comparability type (equality, ordeting, or: mited) ofeach user-defined domain 
operation. | : a 

If the user wishes to state an unusual type Of query, such a8 asking for all employees 
whose name is the same as the name of their department, the user may ‘be allowed to “force” 
the comparison, by explicitly overriding’ the restrictions. Entries in the two columns are 
then compared using the default numeric or lexicographic ordering, treating the values as 
atomic. numbers or strings, respectively. The idea is to pernitt the system to be flexible and 
not to allow comparability rules to get in the way when they should not. The best approach 
may be to warn the user that an operation may be meanitigtess, but allow it to proceed if he 
demands it. (The semantic integrity of the data bate is not really in danger anyway). 

Domain conversions are also useful when a structiired operatién retrieves an entry 
from some column of a tuple in a relation and assigns it to Be the new value of some other 
entry (in a different column of some tuple in’a relation). For example, suppose that the 
date an item was shipped by some company (the entry in‘column Diate of relation ORDERS 
in the example data base of figure 1-2) is to be copied. into the Date column of another 
relation, say BIG_LORDERS. (BIG_ORDERS records all orders which request over $1000 
of merchandise.) The Date column in BIG_LORDERS has underlying domain 
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JULIAN_DATE (ie., dates of the form "6.134"), while. the Date. column in ORDERS has 
underlying domain DATE (ie, dates of the form "1/20/1976", Thus the domain conversion 
from DATE to JULIAN_DATE can be used to effect the desired assignment. 
The general rule for an _ Assignment which takes the entry in a column (A) and 
assigns it as the new value of an entry in another column (B) isasfollows: . 
LI If A and B have the same underlying domain, the asgneent is performed with no 
conversion. : 
2. If A and B have different underlying domains, then: , 
a. if a domain conversion exists from Ato B, ‘the conyersion Is used to affect 
the assignment, — . 
b. if no such conversion exists, the assignment is not allowed. 
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5. STRUCTURED OPERATIONS 


A very important aspect of data base semantic integrity is the set of operations a user 
may employ to examine and manipulate the data base. It is possible to describe a user's 
view of a data base as consisting of data structures plus operations. Alternatively, one may 
conceptually characterize the user’s abstract view completely by a set of abstract operations, 
as is done in abstract data types {Liskov 1974]. These operations provide a behavioral 
specif ication of the semantics of the data base. 

For these reasons, the concept of a structured operation is included in our approach 
to semantic integrity. The principal purpose of a structured operation is to embody a 
conceptual data base transaction: an action which is meaningful and permissible in the 
‘context of the application environment. For the example data base of figure 1-2, structured 
operations may include: hire_employee, fire_employee, raise_salary, place_order, 


create_new_department, etc. 


5.1. Semantic Integrity Information in Structured Operations 

One approach to preserving the semantic integrity of a data base is impose the 
restriction that the operations that may be performed on a data base are only those in some 
given set. This set of operations should be defined so that it contains only meaningful 
actions. However, the approach of allowing only semantically meaningful operations has 
several problems: 

1, Operations which are not semantically meaningful in the context of the application 

environment must be allowed, e.g., to permit errors to be corrected. 

2. The set of operations that are to be allowed may depend upon some characteristics 


of the data base state. For example, the set of operations Ol may be legal if the data 
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base is in state SI, but if the data base is in state S2, eet of ea operations may 
be O2. 

$. The uses of a data base are not fixed, but rather eVotve with time. Operations 
change and: new operations need to be created: If the sehianthé integrity information 
is embedded ini these operations, a scan of all dati’ base operations 
to make such modifications. 

4. Often data is maintained in a data base before uses for it are discovered. Thus it 
is difficult to characterize the data via a behavioral semantics approach; in some 
sense the semantics of the data is known, but the exuenanunt of the incl operations 


aay b be ¢ necessary 


on that data is not. 


5.2. The Definition of Structured Operations 

Despite the prablems mentioned above, it is eect to. be able to define a set of. 
abstract operations on a data base. To this end, we allow structured operations to be | 
defined. Structured operations are constructed using: | 

1. the primitive data base operations (eg., see figure 1-3), . 

2. statements in a very high level data. salection (query) and: dats modification 

language, such as SEQUEL (or QUEL or Query by Example). 
Structured operations are ordered lists af: primitive operations, statements in a data 
selection and modification language, and previously defined structured operations. 
Allowing previously defined structured opérations within new ‘operations enables a 
hierarchic organization. : | . 

For the example data base of figure 1-2, a structured: eaueals to raise an ie dade 
salary could be defined: 
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operation raise_salary (employee_name, new_salary) 
update EMP 
where Name » employee_name 
Salary = new_salary 
This structured operation consists of a single SEQUEL-like statement, which updates the 
Salary column of the tuple in EMP with a value in the Name column equal to the first 
parameter of the operation (presumably there is one such tuple). The new Salary value is 
specified as the second parameter. 
Consider an operation to place an order (again in the context of the example data 
base of figure 1-2): 
operation place_order (customer_id, item_id) _ 
insert_tuple (ORDERS) 
Item = item_id 
Customer = customer_id 
Date_shipped = date() 
Order_number «= generate_order_number() 
In this example operation, a tuple consisting of all null values is first created, and then its 
columns are given values. Note that two external procedures are called, one to return the 
current date and the other to generate a unique order number. The types of names 
(identifiers) used in the definition of the operation include those of parameters, a relation, 
columns, and external procedures. 
The operation check_credit_and_order could be defined as: 
operation check credit_and_order (customer_id, itemid) 
if check_credit (customer id) 
then place_order (customer, item) 
else error 
The operations check_credit and place_order used in this definition are assumed to have 
been previously defined. Note that this operation contains a conditional expression: a 
useful construct we may include in the structured operation language. This of course 
motivates the need for other types of constructs, eg., for iteration. We may for instance 


want to have an operation that takes an arbitrary number of items as parameters and 
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places an order for each. 

Thus, in general, it might be desirable to have a structured operation language which 
has: many of the capabilities of a general purpose programming language: We could 
consequently allow structured Operations to: be written: in some high ‘level general purpese 
programming language. The details of this are not persued here. 2 

One important point ‘to note in passing, is. that: structured ‘operations are important 
with regard to the specification of when relation constraint assertions are to hold (be — 
checked). This is further discussed in chapter 6. 
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6. RELATION CONSTRAINTS 


The fourth aspect of semantic integrity in a-relational data base system concerns — 
relation constraints. In this chapter, the requirements: for relation constraints are detailed, 
and an approach to their specification is-presented. 

Codd [Codd 197tb, Codd ,197ich has- identified. the “third normal form” of relations 
(Codd 1974a]; “A relation, R is in third normal form. 4f ités in: first normal form and, for 
every attribute collection C of R, if any attribute notin © ts faanctionally dependent on C,: 
then all attributes in R are functionally dependent on C.” Third normal form facilitates the 
Straightforward expression of some types of relation constraints, namely functional 
dependencies. But the class of data properties describable via functional dependencies is 
limited. . 

Boyce and Chamberlin [Boyce 1973a] observed that a high level language, such as 
SEQUEL [Chamberlin 1974b, Chamberlin 1975), may be used as a vehicle for the expression 
of data properties other than functional dependencies. SEQUEL expressions were shown 
to be useful in expressing such types of properties as “uniqueness of key", “functional 
dependency", "validity check”, and “inter-relational constraints”. 

The integrity assertions of SEQUEL [Boyce 1973a, Eswaran 1975], INGRES | 
(Stonebraker 1974c], and Query by Example [Zloof 1975b] are used to express varied types _ 
of data properties. However, these facilities basically provide for the unstructured 
specification of arbitrary predicates. Although the assertion expression capabilities of 
SEQUEL and INGRES are “complete”, they do not allow for the analysis of the types of 
possible assertions. | | 

Furthermore, the assertions of SEQUEL and INGRES are rather inflexible with 
regard to when they are to hold, and what action is to occur if they do not. In SEQUEL 
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and INGRES, if a data base change is specified which.wauld: cause some assertion to be 
violated, the data base change is immediately rejected.and. an error signaled [Eswaran 1975], 
or the data base change is modified such. that the assertion will be-satisfied {Stonebraker 
1975c), 

In response to this latter objection, a ‘telation cansimaiat is herein defined as an 
abstract statement, having three components: 

1. the assertion (a property), which is a predicate.on the. sate of the data base or on 

transitions between data base states, 

2. the validity requirement, which apeties the aecasion(s) at which the-assertion is to 

hold, 

8. the v violation-action, which is the action that.is to occur: if the assertion Is not 

satisfied at a time, when it shquid be. 

In response to the former. objection, a detailed tlassification of: relation constraints is 
presented below. The emphasis is placed on providing a structured framework, which may 
be used to construct a high. level, abstraction-based, well-directed, ‘and: disciplined - relation 
constraint specification methedology. In so deing, a principal: goal isto impose some 
structure on the problem of semantic errors in data bases. In this approach, it is important 
to keep “an eye toward implementation”, although no specific implementation considerations 


are included in this thesis. 


6.1L. Whither Assertion Structure? . 

We subscribe to the view that the assertion. component. of a data base relation 
constraint Should nat be viewed.as an arbitrary predicate of the first-order predicate 
calculus, ranging over tuples of the relations of a data base. Rather, every assertion should 


have a well-defined, uniform structure. There are several advantages to taking a 
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disciplined approach to assertion expression: 

1. It provides the data base administrator (or other authority responsible for 
expressing the- constraints) with a conceptual framework in terms of which to 
organize his thinking and structure the formulation of assertion specifications. 
Reducing abstract, probiem-ariented limitations on’ tohfigurations of the application 
environment to concrete restrictions on‘ vattes ih’ the data base is essentially a 
programming problem. By providing the “progrartime 
general framework for his problem, it is possfleto-#ignificantly eate his task. 

2. The issues of constraint specification’ which are’ aitGliary to assertion expression, 


with a theoretical and 


namely the validity requirement and violation-action, cannot be satisfactorily 
addressed in thé absence of the kind of strucure ‘proposed herein. The degree to 
which a semantic integrity subsystem can respond “intettigently” to a constraint 
violation depends: upen how well the formalatien ofthe constraint taptures the intent 
of itsexpresor, = i; ed ieee s 
3. A useful: conceptual framework ‘for assertions: wat ‘provide some measure of the 
complexity .of: individual assertions, providing thair' eiepressor With-a guide to the cost 
of their implementation. Indeed, the’ structure: of wmatsertion can be used by ‘an 
- implementation facility asa guide to: the’ stravegy for the implementation of its 
checking. oo. = 
It is important to note that insuring that there is a single, unique specification of a 
given conceptual constraint is not a major objective here. Rather, thecemphasis is placed 
on encouraging a “reasonable” formulation; one: which: accurately: models the application 
environment abstraction and which is useable by an implementation facility.» 


Semantic Integrity Specification 5] 


6.2. Relation Constraint Assertions 

The assertion component of a relation constraint 1s & logical predicate on the state of 
the data base or transitions between data base states. It expretses sore semantic property of 
the data base. | | 

Each assertion is either a simple assertion or a combination: of simple assertions (a 
derived assertion). Simple assertions may be combined using boolean operators and other 
connectors (such as “if then else"). The remainder of this: section deals with simple 
assertions; the generalization to derived assertions is mor@oriess straightforward. When 
no ambiguity is possible, “assertion” will be used in place of “simple assertion”: 


6.21 Simple Assertions — 

Every (simple) assertion may be viewed as dplimiting certain vaiues ‘of the data base 
in terms of certain others. Fhat is; an assertion:does not merely express some relationship 
among different values in the data base. Rather, it-singles out certain values, and ‘identifies 
them as being the constrained -data of the predicate. The predicate delimits the legal values 
of the constrained data in terms of the constraining date. Ti, every assertion constrains 
some data with respect to some other; the two are‘not being bilaterally restricted. 

As a consequence, there are two distinet steps in the pretess of stating an assertion: 

1. The data that is being constrained is described. This description is accomplished 

in two sequential substeps, in which the following are identified: 

a. the set of all data objects in the data base that are being restricted (the - 
constrained collection), | 
b. the precise aspect of each of these data objects that is being delimited (the 


restricted expression). 
Part a of step 1 utilizes data selection predicates. The predicate expression 
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capabilities of any data selection or query language.may be-adapted to accomplish 
this task [Chamberlin .1974b, Chamberlin 1975; Cadd, Wa, (Cedd 197id, ‘Hail 1975, 
McLeod 1976c, Held 1975b, Zloof 1974, Zloof. '87ba].. Bor-example, consider the 
assertion that the salary of each employee in the sales department is less than. the 
salary of his manager. ‘Here, the constrained collection: consists of those tuples in 
relation EMP. which -have. "sales" in the Department: calgmn. The restricted 
expression is the.Salary.estry of each such- tuple. The mecessity of first identifying 
_ the canstrained; collection amd then the; restsicted. expression is. oceasioned by more 
rich and complex assertions, as discussed: belaw. «ecco Bore 
2. The actual predicate of the assertion. is stated, which asserts a restriction on the - 
value of the restricted expression for each member of the constrained collection. The 
_ predicates used. therein. ‘ane called: assertion-geeditates: im: general, this restriction 
depends on. other. data.in the daja base, The athen data: which participates in the — 
assertion is called the constraining data, and-she expression which computes the 
precise delimiting value. is called the. xestricting expomsiog. .For example, for the 
assertion above, the constraining data. (for each tuple).is the,tuple in-relation EMP 
whose Name entry. equals the Manager entry. of the.constrained tuple; the restricting » 
expression is.the Salary entry of the constraining tuple, 005 
Figure 6-1 contains same examples.of simple. assertions.. For each assertion, the 
constrained collection and assertion predicate are identified. . Nete that the “language” used 
to specify the assertian predicates. is intended anly to :be. illustrative, but: is more-or-less 
consistent with the “level” of (and directly translatable. inta).relational data selection 
languages such as SEQUEL, QUEL, and Query by Example... 5 
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6.2.2. Identification of the Constrained Collection 

As introduced above, the first step in the specification of an assertion is. the 
identification of the constrained collection: that which is conceptually being Gelimited by 
the assertion, In general, the constrained collection is a collection, of data objects, and the 
assertion applies to each of them. In this sense, every assertion is in.effect an assertion 
schema, which is instantiated for each element of the constrained. collection, 

__ An assertion may either express a property of an individual tuple. tuple assertion), 

or a property of a set of tuples considered as a whole (a ast aasertion), . In m figore 6-l, 
examples 1-4 are tuple assertions, while examples ' 58 are set assertions. _ 

The constrained collection for a tuple assertion is a collection -of pees to a of 
which the assertion applies. The constrained collection for a set assertion, similarly, is a 
collection of sets of tuples. The set asertion applies to each mip, gt in the constrained 
collection. An important (and frequent) special case of a set assertion, is that in which the 
constrained collection consists of a single set. Note the difference between this special case 
and a tuple assertion: in the former, the assertion applies tp the tuple set as.a whole, while 
in the latter it applies to each individual member.of 4, Thus, in example L,.the constrained 
collection thas many elements, gach of which is a tuple.of the EMP relation; in example 5, 
the constrained collection consists of a single element, which is. the antire EMP, relation; in 
example 6, the constrained collection has aeracel semmeney, eh of -which. is a. subset.of the 
EMP relation, Pie | 

Both for tuple and no sacs, det ning. sue-soneenload tlpson begins. with 
identifying some set of tuples (called the underlying elasion of the assertion), This tuple 
set can then be manipulated, by means of data selection predicate, ta.uitimately define the. 
constrained collection. _ 

The underlying relation of an assertion need not be a relation defined as part of the 
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data base. In general, it may be any Of the following: 

1. a base relation (a relation explicitly present in the set of data base Felation) 

2. the cross produet of two or more basé relations, " 

8. the union of two or more base relations, 

4. the cross product ‘of two oF mire relations of hu | and 3, at least one of which is 

not a base relation, fen ee oe . 

5. any relation whith tan be defined in terms of base relations, not included in the. 

above (these relatioris ‘May be constructed using the warlous ‘selection ‘criteria and 

retrieval operator’ of'a dita’ sélection langiiagé). ee eee 
For example, EMP is'é Yelation of type 1) EMP tread BUDGET is of type 2. An example of 
a relation’ of typé’S ‘Would’ be thie lnfoti' Sf relations’ CORRENT EMP’ and OLD_EMP 
(where both hive the’ same ‘seracrure as EMP)" AN Sakimpte oF 3 a relation of type 5 is 
SAL_TOTAL (Department: Sai’setaties); Where Sum_Milsties {4 the Sum of the salaries of 
employees working for thé associated department’ 9°" 

The foregoing: classification of underlying relations is‘in order of increasing 
complexity, and exhibits the different kinds of telations to Whieh assertions may apply. It 
is importaint to “Observe that'an assertion need riot dad ‘et hain ses in 
the data base; but may hold’ fora derived relations: °°" 

Once the underlying relation is defined, the precise specification of the constrained 
colfection can be ‘accomplished. In the case’ of' tuple dstertibns, ‘the constrained coflection is 
obtained: from’ the ‘tindérlying: relation’ By wears “of “diita Welecti¢n ‘predicates. The 
complexity of the’ selection’ process cat be described’ in'terms of thé operators of the data” 
selection language. Selection of the constrained collection is a or, ‘in’ the ‘specification’ 
of a relation. maps WAR ae 


However, in the case of set assertions, there is a need to specify a collection of tuple 
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sets; each such set is a member of the constrained collection. For illustration, consider the 
following tentative taxonomy of the first stage of the. specification precess for a constrained 
collection whith consists of tuple sets: 
1. The constrained collection may contain a aa set of tuples, selected from the 
underlying relation. (sigple set) 
2. A set of tuples may be selected from the underlying relation, and then divided into 
groups, eg. by common value in one or more columns or by: intervals of column 
values (such as 21 < Age < 30, 31 < Age < 40, etc), Certain of these. groups may then 
be chosen based on properties they possess. The constrained ‘collection is thus a 
collection of tuple sets, namely the groups that were 20 chosen, The assertion then _ 
applies to each tuple set in the constrained collection. (grouped set) 
3. A set of tuples may be selected from the underlying relation, and those subsets of it 
which satisfy a specified property are chosen. An example.of such a property might 
be that the number of tuples in the subset equals three. These chosen subsets 
comprise the constrained collection, and the assertion is. applied to each of them. 
(property-defined set) | | 
There is a noticeable degree of flexibility in the foregoing framework for identifying 
the constrained. collection, in that it does not impose a rigid specification methodolagy on 
the expressor of assertions. The criterion of completeness would not demand all the options 
for the underlying relation. allowed above; it is clear that. any assertion can.be satisfactorily 
specified by letting the underlying relation be the cross praduct of all the base relations and 
performing various operations thereon to compute the constrained collection. However, in 
many instances such an “allat-once” approach would be cumbersome and unnatural. It 
might be more convenient to follow a "top-down", step-by-step approach and define a. 


sequence of derived relations, the last of which is the underlying relation. This can 
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facilitate the straightforward expression of the assertion. 
Consider the following’ assertion: the’sam of “salaries df’ employees of each — 
department is less than the budget of that department. An all-at-once approach to 
expressing this assertion would: proceed to identity thé cdnstrained eBilection as the set of 
tuples in EMP, grouped by common Department (grouped: se}:"’The restricted expression 
would’ be’ the ‘sum of the Salaries:(for'each group)’ The ‘tséértidn predicate is then 
“sum(Satary) < BUDGET Salary_budget whete BUDCET.Department - 
common_valie_of (Department) (in the constrained ‘tiple te)”. “Thus the constraining data 
is the tuple in’ BUDGET having the Department ‘columiy enitry ‘equal to the common value 
of the éntriés inthe Departmerit ‘column '‘for'the-constralidd tuple sé, and thé restricting 
expression is the Salaty_budget-columin entry of the conidinitg tuple: | 
A top-down, stép-ty-stép approach to thé ékprediol of Ge’above assertion may 
proceed by noting’ that the assertion could ‘be’ expfetsed ‘A'S Bupie Aitertion, if there existed 
a relation of ‘the form DEPART MENTS (Dépatiment, ‘Sdim¢ofemp_salaries, 
Salary_budget). If such ‘afelation’existéd, the ‘donsetdinedl CattetHioh Would be each tuple in 
relation DEPARTMENTS. The restricted expression would béthe column entry 
Sum_of emp_salartes. The assertion predicare‘wodtd' be “Sum_ofemp_salaries < 
Salary_budget”. -Here the restricting expression is thie! ‘edluthin’ efitry’ Sataly budget in ‘the 
constrained tuple; and’ the constraining data is the covstfathed tuple ttielf. os 
However; the relation’ DEPARTMENTS ‘does’ not exist. “Consequently, it is necessary 
to specify how it ts to'be dérivéd-from existing bate rélations. “Phe ufiderlying relation of 
the constrained collection is°thus'a’ derived relation; Le;'the relation DEPARTMENTS. A 
data selection’ tanguiagé Would be used to’ construct this derpvee-teatlin; tor ies the 
spectfication could be tha eleanor: aid 
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DEPARTMENTS (Department, Sum_of_emp_salaries, Salary_budget) = 
select EMP.Department, sum(EMP.Salary), BUDGET Salary_budget 
from EMP, BUDGET 
where EMP.Department « BUDGET.Department 
group by EMP.Department 

6.2.3. Tuple Assertions 

It is now appropriate to examine more closely the structure of tuple assertions. In this 
case, the constrained collection is a collection of tuples, obtained from the underlying 
relation by the application of data selection predicates. The assertion predicate then applies 
to each individual tuple in the constrained collection. Tuple predicates are used to specify 
tuple assertions. The restricted expression defines that aspect of each constrained tuple that 
is being delimited. In the simplest case, the restricted expression is some column name of 
the underlying relation. More generally, it may be an expression: an appropriate 
combination of column names, system-provided operators, and user-defined operators. 

It may be possible to formulate a given conceptual assertion in different ways, with 
different restricted expressions. For example, though the tuple assertions “Creditine - 
Debt < 50000" and “Credit_line < Debt + 50000” are logically equivalent, in the former case 
the restricted expression is "Credit_line - Debt", while in the latter case it is just 
"Credit_line”. This flexibility enables the assertion expressor to precisely identify which 
data values are to be regarded as dominant, and which as subordinate. In the first case, it 
_is a combination of the entries Credit_line and Debt that is being delimited, while in the 
latter case it is simply the Credit_line entry. This distinction contributes to the abstraction 
power of assertion expression, and has implications for the implementation of constraints 
and for the actions that are to be taken upon the detection of an assertion violation. 

The value which delimits the restricted expression is the restricting expression, which 
is computed from some data values which may reside anywhere in the data base. In 


particular, these data values (the constraining data) may be outside the constrained tuple. 
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Tuple predicates may be classified on the basis of the relationship between the 
constrained collection and the constraining data: eas 

1. A tuple predicate is local (L) if the constraining data ‘is presesit in the constrained. 

tuple. That is, for a local tuple predicate, all data referenced in the predicate is. 


within the constrained tuple itself. 


2. A tuple predicate is nonlocal independent (NI) if the constraining data is data 
selected from elsewhere in the data base, but whose selection does not depend on any 
data in the constrained tuple. - 

3. A tuple predicate is nanlocal dependent (ND) if the selection of the constraining 

data does depend on data in the constrained tuple. | 
In figure 6-1, examples 1 and 4 involve L-type tuple predicates, example 2 is an N-type 
tuple predicate, and example 3 is an ND-type tuple predicate. . 

This classification is in order of increasing complexity. For L-type tuple predicates, 
one has only to. look at the constrained tuple to determine the restricting expression; the 
constraining data is present in the constrained. tuple itself. For type-NI tuple predicates, 
this is no longer the case. The restricting expression is now computed from data arbitrarily 
located in the data base, not confined to the constrained tuple. However, the data from 
which the restricting expression is computed is the same for each tuple in the constrained 
collection. Thus the restricting expression admits of a one-time computation, with the result 
being used for each constrained tuple. For type-ND tuple predicates, the computation of 
the restricting expression depends on data in the constrained tuple. It is therefore necessary 
to recompute the restricting expression for each individual constrained tuple. 

There are two dimensions by which we classify local tuple predicates. The first 
dimension measures the complexity of the restricting expression, and has three levels: 


1. The restricted expression is compared via a scalar comparator to a constant, a single 


Semantic Integrity Specification 59 


column entry from the constrained tuple, or an expression involving several column 

entries from the constrained tuple. (types 1-3) 

2. The restricted expression is compared via a set comparator to a set of constants, a 

set of column entries from the constrained tuple, a set of single-valued expressions 

computed from entries from the constrained tuple, or some expression which yields a 

set of values and depends on entries in the constrained tuple. (types 4-7) 

3. The restricted expression is compared via a set comparator to a set of constant 

tuples, a set of tuples involving entries from the constrained tuple, a set of tuples 

_ composed of single-valued expressions computed from entries from the constrained 

tuple, or some expression which yields a set of tuples and depends on entries in the 

constrained tuple. (types 8-11) 

The second dimension reflects the complexity of the restricted expression, and also 
has three levels: 

a. For types 1-7, the restricted expression is a column entry in the constrained tuple. 

For types 8-ll, it is a subtuple of the constrained tuple. 

b. The restricted expression is a single-valued expression. For types 1-7, the restricted 

expression is computed from column entries in the constrained tuple, and yields a 

scalar value. For types 8-il, it yields a tuple composed of such column entry 

expressions. 

c. The restricted expression is a set-valued expression. For types 4-7, it yields a set of 

scalars. For types 8-ll, it yields a set of tuples. (This level does not apply to types 1-3.) 

Figure 6-2 illustrates this classification for local tuple predicates of types la-ila. 
Consider the relation R (A, B, C, D, E, F) (where columns A, B, and C have underlying 
domain real number and columns D, E, and F have underlying domain character string). 


Some examples of local tuple predicates may be classified, as follows: 
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1. A < 15 (la), 

2. A <B (2a), 

3. A < B/C (8a), 

4. A is in {"x", "y”, °2"} (4a), 

5. A is in {"x", E, F} (6a) 

(This means that, for each constrained tuple, the entry in eoluni A is in the set 

containing the constant "x" and the entries in columns E and F.), _—~ 

6. (D, E) is in {("x", “y"), ("2", F)} (10a) | | 

(This pean that, for each constrained tuple, the subtuple consisting of the entries 

from columns D and E equals either the tuple Cx*y", ora tuple whose first 

component is "z” and whose second component is the F entry of the constrained 

tuple), ae Te 

7.A +B <C (2b), 

8. A +Bisin {C +1, C +2,C +3} (6b), 

9. {D, E} intersect {"w", "x"} contains {"y", "2"} (4c) 

(This means that the intersection of the sets consisting of the entries in columns D 

and E and the constants "w" and x" is a superset of the set containing the constants 

"y" and my), | ~* 

As for local tuple predicates, nonlocal tuple predicates may be classified on two 
dimensions. The first dimension again consists of three levels: _ 

I. The restricted expression is compared via a scalar comparator to a a single-valued 

expression, which yields a scalar value (and which is computed from data elsewhere 

in the data base). (type 1) ; os 

2. The restricted expression is compared via a set comparator to a set-valued 


expression, which yields a set of scalars. (type 2) 
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8. The restricted expression is compared via a set comparator to a set-valued | 
expression, which yields a set of tuples. (type $) 
Again, the second dimension consists of pares levels: | 
a. For types 1-2, the restricted expression is 4 coluron entry. For ue 8, it is a, , tuple 
of entries which constitutes a subtuple of the constrained tuple. ae 
b. The restricted expression is a single-valued expression. For types 1-2, this. 
expression is Somopaied, from eies in the constrained tuple, a and ylelds.a a scalar. For 
type 8, it yields a tuple composed of such eqluma entry expressions. “i 
c. The restricted expression isa set-valued expression, Je. ype 2 it yields 2 a set of 
scalars. For type 8, it yields a set of tuple. (This level does not apply to type 1). 
| Figure 6-8 illustrates this classification for nontocal tuple predicates, of types la-Sa. 
Note that the computation of the restricting expression, (scalarval or serval) is independent 
of the constrained tuple for Ni-type tuple predicates, | but ‘dependent for -ND-ype Preqiicates. 
The data selection language must now serve the added role of sentitying the constraining 
data. For this reason, the Hanlin is coarser Sola ‘nonlocal tuple beeps than for 


local tuple predicates. 


62.4. Set Assertions 

For set assertions, the constrained collection isa collection of tuple : sets, obtained. from 
the underlying relation, as discussed in section 622 The assertion predicate then applies, to. 
each tuple set in the constrained collection. ‘Set Set pred licates 5 are used iQ Specify set assertions, . 
The restricted expression is that aspect of each ie accin tuple set that is being delimited. 
In the aiiplet case, the restricted secpreice is $ the Set of entries, in some column of the 
underlying relation (eg, the set of Salary entries in EMP). More generally, it may be an 


expression: an appropriate combination of column names, system-provided operators, and 
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user-defined operators. These operators include aggregate arithmetic operators which are 
applied to sets of values. 

As for tuple assertions, the restricting expression is the value that delimits the 
restricted expression. The constraining data may be, in general, data anywhere in the data 
base. Again, as for tuple assertions, it may be possible to express a given conceptual set 
assertion in several ways. 

Set predicates may be classified on the basis of the relationship between the 
constrained collection and the constraining data: 

1. A set predicate is local (L) if the constraining data is present in the constrained 

tuple set. That is, the restricting expression may be computed solely from the 

constrained tuple set. 

2. A set predicate is nonlocal independent (NI) if the constraining data is data 

selected from elsewhere in the data base, but where this selection does not depend 

upon the constrained tuple set. 

3. A set predicate is nonlocal dependent (ND) if the selection of the constraining data 

does depend upon the constrained tuple set. 

In figure 6-1, examples 6 and 8 are L-type set predicates, and examples 5 and 7 are NI-type 
set predicates. 

As for tuple predicates, there are two dimensions on which local set predicates may be 
classified. One dimension reflects the complexity of the restricting expression, and the 
other reflects the complexity of the restricted expression. The first dimension has four 
levels: 

1. The restricted expression is compared via a scalar comparator to a constant, an 

aggregate function of the entries in some column of the constrained tuple set, or an 


expression involving several such aggregates. (types 1-3) 
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2. As in |, except that the aggregate functions in the constraining expression are not 
computed for a set of scalars, but for a set of tuples; namely, the collection of | 
subtuples obtained by projecting the constrained tuple set onto two or more columns. 
(types 4-6) 

3. The restricted expression is compared via a set comparator to a set of constants, the 
set of entries in some column of the constrained tuple set, or an expression involving 
several such sets. (types 7-9) 

4. This is analogous to 3 in the same way that 2 is analogous to 1. That is, the 
restricting expression does not deal with scalars, but with sets of subtuples of the 
constrained tuple set. (types 10-12) 

The second dimension consists of two levels: 

a. For types 1-6, the restricted expression is an aggregate function. For types 7-12, it 
is an instantiation of the function "set", which generates the set of values in some 
column or the set of subtuples for some group of columns, taken over the constrained 
tuple set. 

b. For types 1-6, the restricted expression is a single-valued expression computed 
from two or more of the aggregate functions described above. For types 7-12, it is a 
set-valued expression, computed from two or more instantiations of “set”, as described 


above. 


A special type of local set predicates, the column relationship predicates, are not 


included in the above scheme. Column relationship predicates are used to express 
properties such as one-to-one correspondences and functional dependencies. To state a 
column relationship predicate, two groups of column names from the constrained tuple set 
are specified. The relationship between these two groups of columns is then stated. For 


example, one may state that for the relation R (A, B, C, D, E, F), there is a one-to-one 
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correspondence between the column A and the column group (B, C). This means that there 
is a one-to-one relationship between the entry in column A and the subtuple formed from 
the entries in columns B and C. Note that column relationship predicates are always local. 

Figure 6-4 illustrates this classification for local set predicates, types la-l6a. For 
example, for the relation R (A, B, C, D, E, F) (where columns A, B, and C have underlying 
domain real number and columns D, E, and F have underlying domain character string), 
various local set predicates may be classified, as follows: 

1. avg(A) < 15 (la), 

2. avg(A) < sum(B) (2a), ° 

3. count(D, E) < 50 (4a) 

(This means that the number of tuples in the relation formed by projecting the 

constrained tuple set on columns D and E is less than 50.), 

4. set(D) contains {"x", “y”, "z"} (7a), 

5. set(D) properly contains set(E) union {"y", “z"} (9a), 

6. set(D, E) is in {(’w", "x"), (“y", "2")} (10a) 

(This means that the set of tuples obtained by projecting on columns D and E is a 

subset of the set of constant tuples containing ("w", "*") and ("y", "z"), 

7. D one-to-one (E, F) (14a), 

8. set (D) union set (E) is in set (F) (8b). 

Nontocal set predicates may be similarly classified. The first dimension has three 
levels: 

1. The restricted expression is compared via a scalar comparator to a single-valued 

expression, which yields a scalar value (and which is computed from some data in the 

data base) (types 1-2). 


2. The restricted expression is compared via a set comparator to a set-valued 
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expression, which Yields a set of scalars. (type 3) . 

3. The restricted expression is compared via a set comparator to a. eseseual 
expression, which yields a set of tuples. (type 4) 

The second dimension consists of two levels vo 

a. (For types 1-2, the restricted expression is.an scotia tena: For, types Set, is is 
an instantiation of the function: “set”, which generates the sgt of, valuas in: some 
column or the set of subtuples for some group.of, columns, taken.aver the constrained 
eupte set | ; 

b. For types 2, the restricted expression ds. aM gle-valued expression, computed 
from two or more of the aggregate functions described. ire. For types 3-4, it is a 
nos of “set’, as described. 


set-valued expression, computed from two or, more. instan 
above. 
Figure 6-5 illustrates this two dimensional casifcaion | for types J late Note that, the. 


RIS 


computation of the restricting expression (scalarval or. setval) is. independent of the 


constrained tuple set for NI-type set predicates, bur dependent far ND-type predicates. 


6.2.5. Scope of Assertions | ie 26 ie _ ' 

. Tt was stated in section 6.2.2 that each assertion is actually 3 an assertion | scherna: an 
assertion is instantiated for and applies to each element of the constrained collectign. But. 
there is another sense in which an assertion may be viewed as a schema. This is by 
allowing described rather than explicit references to relation and column, names within an. 
assertion. ee | 

relation of the data ae which h has i ada nee mst re a ame the 
Name column in relation EMP. This may be handled by’ allowing column names (and, 
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relation names) to be variables which range over the set of all columns or relations in the 
data base (or some specified subset thereof). This is basically 4 universal quantification of 
second order. Re eee e | 

Without proposing a specific detailed solution to this problem of explicit scope vs. 
described: scope, we may observe that such a solution must facilitate a second order 
quantification, on a fevel above the constrained collection. Consider the assertion that, for 
each column in the data base named Cl, every pair of entries in this column sums to less. 
than 100. Here the constrained collection is a set of pairs of tuples. The property must hold 
for each element of the constrained cillection. Furthermore, the assértion actually applies to 
each element in a set of constrained collections, ving One stich constrained collection for each 
column (in the data base) which is named Cl. 

It has been stated that the scope of a relation constraint assertion can either be 
explicit (apply to relations and columns which are constants) or described (apply to relations 
and columns which are ‘variables whose ranges are described). It is certainly valid to 
question the desirability and practicality of assertions with described scope, and we shall not 
take a position on this matter here. Rather, for the purposes of the remainder of this. 
thesis, it is sufficient to assume that we are dealing with assertions having expliclt scope, 
although we believe that the extension to assertions having described scope is 


straightforward. 


6.3. Relation Constraint Validity Requirement 

Another component of a relation constraint is the validity requirement(s): the 
occasion(s) at which the assetion component of the constraint must hold. 

One possibility is that an assertion must hold at all times, and consequently must be 


checked after any data base change that may cause its violation. Such assertions must 
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theoretically be checked (verified) after every primitive data base change (such as update, 
insert, or delete tuple).. Assertions actually need'to be etiecked only if ‘some Value(s) aré’ 
changed which may cause the assertion to be violated. Some ‘suttest has been achieved in 
automatically determining when an assertion actually needs verification [Eswaran 1875, 
Stonebraker 1975c]. a 

In some cases, it is necessary to specify than an assertion tieed ‘not hold during: some 
complex data base transaction(s), because it may not be meaningful to verify the assertion 
until after the transaction(s) are completed. Such assertfons are thecked only at the end of 
these transactions. 

Suppose; for example, that there is an assertion for the example data base of figure ! 
which states that exactly two employees in the sales-department have‘a salary of more than 
$15,000. Assume that at some time the assertion holds, ‘as ‘employees “Smith” and “Jones” 
beth have salary $20,000 and work in the sales department. It Is how desired to transfer 
employee “Smith” out of. the sales department, replacing him with ‘employee “Davis” (with 
safary $30,000). If the primitive operations. update row, insert row, and delete ‘row are the 
only operations available and the assertion is ctiéeked after each primitive operation, the 
desired change cannot be legally accomplished. Thus the verification of this assertion must 
be deferred until the entire transaction (which consists of two primitive operations) is 
completed. 

Consequently, it can be semantically necessary and/or desirable’ for the constraint 
expressor to specify precisely when an assertion‘is to be checked. For reasons of efficiency, 
it is also important to have the ability to specify that an-assertion need only be checked at 
certain limited times, because verifying it after every data base change that could cause its 
violation might be catastrophically expensive. ‘ 


Accordingly, the validity requirement of a relation constraint should be expressed in 
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terms of structured operations. For example, the validity requirement of some assertions 
-- might be that the assertion is to be checked after operation raise-salary. Each relation 
constraint validity requirement should consist of a list of structured operations after which 
the assertion component is to be checked. The special validity requirement “always” has the 
function of assuring that the assertion will be checked after any data base change that may 
cause its violation. 

It may be necessary to check one or more relation constraint assertions after each data 
base change is attempted (by a structured operation). The simplest type of data base 
change is a primitive update, insert, or delete tuple operation. Slightly more complex is the 
set-oriented tuple update, insert, or delete which may be expressed in the high level 
nonprocedural data selection and modification language (eg., SEQUEL). Since structured 
operations are hierarchically organized, it may be necessary to check some assertions after 
each hierarchic structured operation. Consider, for example, the structured operation A, 
which is defined to have the effect of executing a delete tuple Speradion, followed by the 
execution of operation B. Operation B consists of a single update tuple operation. It may 
then be necessary to check some assertions after the delete tuple operation, after operation 
B, after the update tuple operation (in B), and after operation A. 

A Special treatment of “null” (undefined) values as column entries is required. As 
noted by Eswaran and Chamberlin [Eswaran 1975), the checking of a relation constraint 
assertion should be such that the presence of “null” values should never cause the assertion 
to succeed if it would otherwise fail (be violated), and should never cause it to fail if it 
would otherwise succeed. An exception to this rule is made for assertions which explicitly 


reference “null” values (e.g., “Sex = null”), 
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6.4. Relation Constraint Violation-Action 


Associated with every occasion at which an assertion is to be checked, is a violation- 


action to be taken if the assertion is not satisfied upon attempted verification. Several types 
of violation-action can be specified: 


1. An error can be signalled, and the requested data base change rejected. A message 
is issued informing the user of the problem; the nature of this message may be 
explicitly specified as a part of the violation-action, or it may be chosen by the 
system. 


2. A warning can n be issued, but the illegal data base change. allowed. The user may 


be warned with a system-generated message, or a. message specified as part of the 


violation-actian. The warning may. be persistent, in which case it appears whenever 
the potentially bad data is referenced. oa ae -_ 
$. A corrective action can be specified, which attempts to scape the error; the 
assertion is then rechecked. This approach may be dangerous, but is appropriate in 
some cases. There are several types ¢ of corrective action: 

aa substitute value may be specified to replace the sartinite data, 

b. a structured operation may be performed, 

¢. an external procedure may be called. . 
If a corrective violation-action is attempted, the relation constraint assertion which 
caused its invocation is rechecked after the corrective action is performed. It is 
intended that corrected value and structured operation corrective actions handle the. 
bulk of the corrective violation-action needs. However, it is possible to call an. 
external procedure (which is written in some high. teyel general purpose 
programming language) as a corrective action. This external procedure receives no 


special privliges with regard to data base interaction. There are of course other 
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problems which result from permitting such external procedures to be used, which 

are similar to those discussed in the context of domain definition violation-action (see 

section 3.4). (A more far-reaching set of problems ‘OF this dias is discussed aby Mins) 

[Minsky 1976) eats 

The actual interface which reports relation conitraift Violatfons to the user should 
actually allow this user to control the violation-action. The user should be consulted, if 
appropriate. For instance, assume that the user wishes to petfirmn an ‘Operation which gives 
employee “Jones” a 10% raise in mre Assume also that there is a relation constraint 


assertion which states that the sum of salafles ‘of all the‘ 


20) iia d 


‘th each’ department of 
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the company must be feds tai ‘the budget of that dép: it eam also that this 


assertion would be violated if the salary of ones iy it iiitreased by 16%. A reasonable 


Yaey Yehke Be 


violation-action might be to raise the salary of “Jones” to its maximum | permissible value, 


while Scat this 1 to the user and meee ts for b stored Leplemmus performing the 


‘ trrys hong. fF Pens 
action. 


In this scheme, the viotation-actions are associated with the: ‘assertion: they are part 


of the relation constaint: This means that violation: ni information is not a part of the 


specification of the structured operations. all information regarding. the checking of an 


J petecstefpesoa led 


assertion is localized in the relation constraint. “This has the desirable ef fect of eliminating 


dy xebygts aan trie 1otsG 


the sas sieesaaien — of Violatlon-.ton informa 
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65. Implementation Considerations | 
A relation constraint language processor may ‘be used to “compile” ‘elation constraints 


into an internal form. Relation’ ‘constraints, may be added to ‘and deleted from a data base. 
(A constraint may be changed by deleting Hand adalng’a a ‘eed ‘version.) ‘Adding a 


relation constraint consists of f its compilation and ‘initial ¢ c] 


g. Normally, the constraint _ 
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must be satisfied when it is added to the data base. 
The internal form into which a relation constraint is compiled is used by the semantic 
integrity subsystem to check the integrity of the data base, and to take appropriate action 
‘which violations are detected. Moreover, the integrity subsystem manages all four aspects 


of semantic integrity, as discussed above and in chapter 7. 


6.6. Remarks 
The principal purpose of this chapter has been to impose some structure on the 
problem of relation constraint ‘specification in the context of the semantic integrity of a 
relational data base. Important issues to be considered in future work include: 
1. a detailed analysis of the applicability of specific high level, nonprocedural data 
selection languages to assertion specification (eg, SEQUEL, QUEL, or Query by 
Example), 
2. a complete description of a disciplined specification methodology for relation 
constraints (including detailed example(s) of relation constraint specif ication), 
3. specifications of the user interface of the semantic integrity subsystem, vis-a-vis 
relation constraints, 
4. an analysis of the impact of the semantic integrity subsystem on other aspects of 
the data base system (e.g., data security), 
5. an assessment of the ramifications of various problems concerning relation 
constraints, including: | 
a. redundancies, 
b. contradictions, 
c. circularities (because of corrective action side effects), 


6. a study of implementation techniques for relation constraint checking. 
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7. ON THE DESIGN OF A SEMANTIC INTEGRITY SUBSYSTEM 


The purpote of this chapter is to present some brief comments on several imperial 
aspects of the design of a semantic integrity subsystem. The purpose of such a subsystem is 
to manage the semantic integrity of a data base, as indicated by the semantic integrity 


specifications for that data base. 


71. Components of a Semantic Integrity Subsystem. as 
We propose that a semantic meer subsystem parses four principal components: 
lL. The semantic in ntegrity language processors g translate the specifications in the high 


level semantic integrity Tanguages into internal forms useful to the semantic integrity 
subsystem. “As discussed ‘in this thesis, there a are ‘four semantic integrity languages, for 
domain definition, relation structure, structured operations a relation constraints 
(Actually, these four langauges may b be viewed as sublanguages of a single 5 semantic 
integrity language) : 
2. The semantic integrity checker r determines which domain 1 definitions and relation 


atate a ah asker ae 


constraints need to be checked a after a a given - base change is s performed, and 


performs that checking. 


3. The semantic i integrity violation-action processor takes 8 appropriate a action | when a 


"ARE. 


donain definition or "Felton constraint i is Violated. 


4. The relation constraint compatibility checker is responsible for insuring that the set 


of relation constraints currently extant for a data base is free from contradictions and 
other undesirable properties: The comparepiy © checker may be called by the relation 
constraint . language processor when adding = new ‘elation constraint, to make sure 


that it is acceptable to add it. The problem of designing and implementing a 
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compatability checker involves general techniques of deductive inference, automated 
theorem provers, etc. Only a very limited compatability checker could be practical at 


the present time. 


72. The User's View of the Integrity Mechanism . 

It is extremely important to provide an effective user - data. base system interface, 
especially with regard to the creation, maintenance, and reporting ¢ of semantic integrity 
information. There are ney three major types of users with which one, needs to be 
concerned: . 

1. the data base administrator (DBA), which may in fact be a a single person or many 

persons, whose job is to create and maintain the semantic integrity specifications, 

2. the nonprogramming user, who deals with the data base by means.of generalized _ 

data selection and modification languges (e.g. SEQUEL QUEL, or Query by 

Example), . 

3. the applications program, which calls upon data base system facilities, — 

Of course, a single person may serve both as a DBA and a (nonprogramming) user. The 
distinction between ‘nonprogramming users and applications. programs is made in order to. 
distinguish the types of communication with the semantic integrity subsystem which are 
necessary. : | 

| The DBA should be provided facilities which allow the following types of actions: 

1. add relation, _ 

2. delete relation, 

$. add domain, 

4. delete domain, 


5. add structured operation, - 
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6. delete structured operation, 

7. add relation constraint, 

8. delete relation constraint. 
It should also be possible for a DBA to change the structure of relations, and modify the 
definition of domains, structured operations, and relation constraints. It is f urthermore 
desirable to allow the DBA to ask questions about the semantic integrity specif ications, 
especially the relation constraints. For example, it should be possible to ask which 
constraints may possibly be violated if an entry ina given ‘column is changed, or which 
constraints have a given column entry as constrained data. | 

The nonprogramming user must be provided with high level reporting of semantic 
integrity violations and violation-actions. In general, a (nonprogramming) user sees a set of 
data structures (domains and relations), a set of structured operations, and a set of relation 
constraints. When a domain definition or relation constraint is found to be violated, the 
uSer is either informed of this fact or an automatic conrerive action is attempted In any 
case, it must be possible to provide the user with a ‘high level “error message”. The 
semantic integrity subsystem must not be completely silent (ey g. see [Stonebraker 1974d, 
Stonebraker 1975¢]). It must also be possible for the user to interact with the semantic 
integrity subsystem to attempt to repair an error, should that be appropriate. a 

The applications program must be provided with capabilities similar to those for 
nonprogramming users, but all communication must be accomplished via procedure call and 


return, and message passing protocols. 


7.3. Some Thoughts on Integrity Subsystem Implementation 
Although a detailed investigation of implementation techniques for semantic integrity 


subsystems is an important research topic, little has been done on it to date. Stonebraker 
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and Wong [Stonebraker 1974d, Stonebraker 1975c) have: praposed a very clean "query 
modification” approach. to. integrity checking, but this, scheme.has some limitations (eg., 
some useful types of techniques for the optimization of integrity checking. are not handled). 
Sarin (Sarin 1976] is currently investigating this topic in.same detail. In this thesis, we are: 
not principally concerned with the specifics of implementation techniques. However, we 
shall discuss a few impartant aspects of semantic integrity, subsystem implementation. 

First of all, it is impartant that-a data base logging.and backup facility. exist. This is 
crucial in allowing the actions of a structured operation (transaction).to be “backed out” and 
“undone”, if occasioned by the violation of a domain definition or relation. constraint. 

It is sometimes the case that a data base change will cause several domain-definitions 
and relation. constraints to be checked. (A data base change is accomplished by the 
invocation of a primitive. or. structured operation.) .A.scheme..must be developed for | 
determining in what order these are to be checked, One way to handle this. is to assign 
priorities to domain definitions and relation constraints; .this may.be done by the DBA or 
automatically by the semantic integrity subsystem.. Damain; definitions should receive 
priority over relation constraints (since they are always ¢becked after. primitive operations), 
and the various types of relation constraints can ‘be ordered by their complexity, importance, 
or some other metric. 

Since relation constraint checking is potentially a costly undertaking, it is crucial that 
efficient checking techniques be developed. Much of -the work on optimizing data selection 
and modifiction languages is relevant.here. Heuristics may be developed for determining, 
‘on the basis of the patterns of data base interaction, which access paths and aids to 
maintain [Hammer 1976b]. One type of useful heuristic. involves the. maintenance of 
aggregate values. For example, if there is a relation constraint assertion which: states that 


the sum of employee salaries is lets than $100,000, it may-be helpful to maintain the sum 
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and update it as necessary, rather than constantly recalculating it when the assertion is 
checked. Other types of heuristics may also prove useful,-eg., dealing with characteristics 
of individual types of physicat storage devices (such as data clustering and page 
arrangement), or dealing with the maintenance and use of inversiolis (indices). 


73.1. The Use of Inversions in Relation Constraint Checking (An Example) 

As an example illustrative of the usefulness of ‘inverstons in relation constraint 
checking, consider an example assertion. Suppose tht’ the assertion ‘(For the example data 
base of figure 1-2) states that-for each tuple B in retetion: BUDGET, the entry in the 
Salary_budget cohinin {B Salary_budge) is greater than or: equat te the sum of the entries 
in the Salary column of the tuples in EMP (El, En) which have Department = 
B.Department. Several primitive operations which may require this assertion to be checked 
are listed below, along with the method by which the ttécessary checking may be 
accomplished and an indication of which inversions would be helpful in such checking: 

L for some tuple B in- BUDGET, Salary_budget lt changed: 

a. find all tuples in EMP (El, ..., En) which have paula = B.Department, 
b. calculate S « ElSalary +. + EnSatary, - 
c. check that S <= BSalary_budget, 
useful inversions: Department in EMP (for step:a), 
2. for some tuple E in EMP, Salary is changed: 
a. find all tuples in EMP (Ef, ... En) which ‘have Department = E.Department, 
b. calculate S = El-Salary + .. + EnSalary, 
c. find the tuple in BUDGET (B) which Hat Department = E-Department, 
d. check that S <« BSalary_budget, , 
useful inversions: Department in EMP (for step a), Department in BUDGET (for 
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step Cc), 

8. for some tuple in BUDGET (B), Department is changed: | 

(same as 1), 

4, for some tuple in EMP (E), Department is changed, 

(same as 2), | | ~~ 

5. a new tuple is inserted into BUDGET (B), 

(same as 1), . 

6. a new tuple is inserted into EMP (E), 

(same as 2). | | | 
In this particular example, no checking needs to be done when tuples are deleted f ea 
EMP, since that can only cause the sum s) to decrease. Of course, this is not true for all 


assertions involving sums of this type. 
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8. REMARKS AND DIRECTIONS _ 


The major purpose of this thesis has been to provide a comprehensive, detailed 
analysis of the issues and problems associated with maintaining semantic integrity in a 
generalized (relational) data base system. The principal emphasis has been on the high 
level expression of semantic integrity specifications. The ma jor portion of the work 
described herein has been concerned with provicing, a framework for semantic integrity 
specifications. Both the functional requirements for a solution to the semantic Antegrity 
problem and a specific approach to providing such a solution have been emphasized. An 
attempt has been made to indicate important directions for further work on semantic 
integrity. | ” | - 

By way of conclusion, there are several important general directions for the extension 
of the work described in this thesis. The following are most significant: | 

1. an analysis of important integrity specification language design issues (eg., the 

usefulness of constructs in languages like SEQUEL, QUEL, and Query by Example, 

the adequacy of nonprocedural specification methodologies, the importance of 
iteration and recursion, etc.), 

2. the complete design of a language for semantic integrity specification, including 

sublanguages for each of the four aspects of semantic integrity (in the relational data 

model), 

3. the development of a well-directed, structured, disciplined approach to data base 

design (based on the semantic integrity framework), | 

4. a comprehensive example of the application of the semantic integrity specification 

methodology described herein to a “real” application domain, | 


5. the implementation of the semantic integrity subsystem outlined in this thesis, 
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6. an analysis of the cost of. building, maintaining, and -enfercing semantic integrity 
rules, 

7. a study of the relationship of semantic integrity issues with those of security, 
concurrent. consistency, and query processing;(including the use of deductive . 
techniques), . | 

8. an evaluation of the ramifications of separating the four aspects of integrity to the 

extent described above (e.g. an analysis of whether it is.necessary to allow the , 
information within a domain definition to be referenced,in relation constraint: 
assertions), and a study of the appropriateness of this approach, 

9. an evaluation of the applicability of a behavioral approach to the description of 

data semantics in an integrated data base environment, | 3 

10. the extension of the semantic integrity scheme: to allow multiple “views” of a data . 
base, | . . . a 

ll. an evaluation of possible extensions to permit a-nonabsglutist approach to integrity , 
(involving the notions of quantized truth and confidence measures [Zadeh 1976), 

12. a study of the ability of the approach to the semantic integrity problem described 


in this thesis to improve the overall effectiveness of a data base system. 
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Figure 1-1, Relation EMP 


column -> Name Sex Salary Manager Department 
under lying 
domain -> NAME SEX MONEY NAME DEPT 


Jones, Richard male $12,888 Jones, Richard research 
Phillips, Jeff male $18,688 Smith, Kathy sales 


Smith, Kathy female $11,888 Jones, Richard sales 
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Figure 1-2. Example Data Base .. - 


Domaine: 
NAME QUAN 
SEX ORDER_NUM 
MONEY CUST 
DEPT DATE 
ITEM 

Relations: 


EMP (Name, Sex, Salary, Manager, Department) 
NAME SEX .MONEY NAME DEPT 


SALES (Item, Department, Quantity on hand, Cost) 
ITEM DEPT QUAN MONEY 
ORDERS (Order_number, Customer, Item, Date_shipped) 
ORDER_NUM CUST ITEM =~ DATE 


BUDGET (Department, Salary_budget) 
DEPT MONEY 


Figure 1-3. 


create 
delete 
create 
delete 


insert 
delete 
update 


domain 
domain 
relation 
relation 


tuple 
tuple 
tuple 


add column to 
relation 


delete 
from 


column 
relation 


copy relation 
intersection 


union 


difference 


join 
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A Possible Set of Relational Primitive Operations 


(these operations allow domains and 
relations to be defined and deleted) 


(these operations allow changes’ to be 
made to data in relations) 


(these operations facilitate relation 
modification and relational algebraic 
manipulation of a data base) 
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Figure 3-1. Selected Example Data Base Domain Definitions 


domain NAME | ("Smith, John") 
description 
last: string 


, 
firsts string 
ordering 
last, first 
violation-action 
error 


domain SEX ("female") 
description 
oneof ’female’, *male’ 
ordering ; 
none 
violation-action 
error ’sex must be female or male’ 


domain MONEY ("$188") 
description 
*¢° 
values number where >=8 
where length(right(*, °.’ + 1)) » = 
or not present x, ’.’ 
ordering 
value 
violation-action 
substitute nul! ‘value in error, null has been assumed’ 


domain ITEM ("AB-75-326") 
description 
string where not has numerics, °- 
iis °-’ 
i2: string where not has alphabetics, 
where repititions il through i2 >=1 and <=3 


, 
ae J 


or 

string where cal! check_item 
ordering 

call compare_item 
violation-action 

substitute left(«x, 5) 
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Figure 3-1. (continued) 


domain QUAN (17) 

description 

value: number where integer 

and >=8 

order ing 

atomic 
violation-action 

call fixup_quan 


domain DATE ("1/26/1976") 
‘description 
months oneof 1, ..., 12 
Ly Ad 
day: number where integer and >=1 and <«#31 
°/197° 
years: number where integer art >=5 “rid -<09 - 
where (if (month = 4 or =S or =3 or =11) then day<=38) 
and (if orth = 2 then day <= 29) 
and (if (month = 2 and year ~= 6) then day <= 28) 
order ing 
year, month, day 
violation-action 
error 
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Figure 3-2. Syntax of the Domain Definition Language 


domain-definition :s:= DOMAIN domain-name 
DESCRIPTION: © 
' ~. @Geecripthon-ctause : 
{ORDERING 
order ing-ctaee] 
[VIOLATION-ACTTON © 
violation-action-ciausel 


domain-name ::= string-conetent =< 


descr iption-clause tie descr iption-subc! suse © 
| deecr iption-clause -: 
OR - 
degcr iption-subc! ause 


descr iption-subc! ause ::= description 
luhere-restrsetion?: 


description ::= [labels] eubunit. 
| description 
~ Uiapei:) subunit 


label ::= string-constant 


subunit ss» STRING (WHERE str ing-boo! ean) 
| NUMBER [WHERE nusber-boolean) . 
| ONEOF string-constant-list 
| ONEOF number-constant~I jet 


str ing-constant=tist. fis iio cone tat ccapavent 
- | etring-constant-list, Tiring-constant-cosponent 


str ing~constant-component ::= str ing-constant - 
ALPHABETICS . 
| NUMERICS 
| SPECIALS 


number-constant-list :3= number-constant 
| number-constant-l tet, Niuskabcomekant a 


string-boolean ::= string-boolean-term " ak. 
| string boolean: OR eehlnactee laan-tara ae 


string-boolean-term :t= string-boolean-factor: = 
| string-boolean-tera: NO: ‘atv ina-bosi dens tactor 


il 
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Figure 3-2. (continued) 


string-boolean-factor ::= string-boo! ean-primary 
| NOT .atr ing-boolean-pr i mary 


string-boolean-primary ::= string-predicate 
| (string+boolean) — 


string-predicate ::= comparator string-constant 
| IF string-predicate THEN -e4ring-predicate 
[ELSE atring pred! cate! 
| SIZE comparater nunkber-sxpress ton 
| HAS string-constant=1ist - 
| CALL procedure 


comparator s:= = | w= | >'| >= | < | <= 


number-boolean ::= number-boolean—term: 
| number-boolean OR mumershoo| san tenn 


number-boolean-term ::= number-boo! éantad tor 
| mumber-boolean-term AND didi Sow Can tae toe 


nunmber-boolean-factor ::= number-boo! ean-primary 
| NOT number -boo | beat al i nary 


number-boolean-primary ::= hinberc predicate 
| (number -boo lean) -- 


number -predicate tte ria aah niniber-conutant.. 
{ IF Rumberepredi cate: THEN- number -~predicate 
{EL.SE-nonter-predicate) 
7 INTEGER 


| EXPONENTIAL 
| CALL procedure 


where-restriction ::= boolean 


boolean ::= boolean-term 
| boolean. OR beotean-term 


boolean-term ::= boolean-factor 
| boolean-tera. AND boolean~factor 


boolean-factor ::= boolean-primary 
| NOT peolean-pr imary 


boolean-primary ::= predicate 
| (boolean) 
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Figure 3-2. (continued) 


predicate ::= expression comparator expression 
| IF predicate THEN predicate 
{ELSE predicate] 
| PRESENT expression, string-constant-list 
| CALL procedure 


expression ::= [faddition-operator] unsigned-expression 


unsigned-expression ::= arithmetic-term 
| unsigned~expression addition-operator arithmetic-term 


arithmetic-term s::= arithmetic-factor 
| arithmetic-term multiply-operator arithmetic-factor 


arithmetic-factor ::= subexpression 
] (expression) 


subexpression ::= atomic-expression 

| set-function(expression-~list) 

| APPEND (expression, expression) 

| SUBSTRING(expression, expression, expression) 
| LEFT(expression, expression) 

| RIGHT (expression, expression) 

| LOCATION(expression, expression) 

| LENGTH (expression) 

| REPITITIONS tabel THROUGH Jabel 


atomic-expression ::= label 
| string~constant 
| number-constant 
| x 


expression-list s:= expression 
| expression-list, expression 


set-function s::= MAXIMUM | MAX | MINIMUM | MIN | decinesccawtant 
addition-operator ::= + | - 
multiply-operator s:= x | / | 4% 
ordering-clause ::= ordering-list 
| NONE 


| ATOMIC 
| CALL procedure 
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Figure 3-2. (continued) 


ordering-list ::= label ~ 
| ordering-list, label 


violation-action-clause ::= violation-action 
| violation-action-clause 
violation-action 


vialation-action ::= ERROR 
| ERROR message 
| SUBSTITUTE expression 
| SUBSTITUTE expresslon.aeanige: 
| CALL, precedupe 2 or se. ge hi 
4 CALL procedure message a 


message t:= string-constant 
|. SYSTEM-GENERATED 


procedure ::= string-constant 


Notes: 


The nonterminals. string-eenstant and mente -conetant’ are not 
further defined. Pe BREE se: 


eathqeane refers. to the cheracters. 7o through “z" and “a "a" 
through "z", NUMERICS refers to the digits 8 vba saad 
SPECIALS cetaee: to all other characteres i. a: + 


SIZE returns the length of a string sibunie. “HAS ‘le eees SN 
returns "true" If a subunit has an occurrence of each of the 
strings sl, ...,.8n {otherwise “false"). hia appear — 
only in subunit where restrictions, ants i 


SUBSTRING (8,52, 12) returns. the: substring. of string: ‘g starting 
at character il and extending i2 characters. LEFT(s,i) and 
RIGHT (s,1) return the left and right substring, {respeetivedy) 
of s having length i. SUBSTRING, LEFT, and-R 'GHT may also be 
invoked with a second argument which is a.etring.. Bhie:means 
that the substring is to start at the leftmost or rightmost 
occurrence of the second string argument, e.g.,-"LEFT(x, °.°)" 
and “LEFT(*, INOEX(x, °.°))" are equivalent. LENGTH(s) returns 
the length of string s. APPEND(sl,s82) concatenates::sl and s2. — 
LOCATION(s1,82) returns the index of the.first oecurrence of 
s2 in sl (or 8 if s2 is not a substring of sl). REPETITIONS 

sl THROUGH s2 returns the number of repetitions (of the domain 
value) for subunits labeled sl through s2. 


li¢ 


ome 


1. 


4. 


Ss. 


7. 
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Figure G-1. Some Simple Assertions (for data base In figure 1-2) 


Note: CC means constrained collection, PR means predicate 


The salary of every employee is jess than 68, 006. 
CC: each tuple in EMP 
PR: Salary < 58088 


The manager of each employee is also an enployer. 
CC: each tuple in EMP 
PR: Manager is present in set of ais Names from tupies 
in EMP 
The salary of each employee in the toy depar trent ie “tees 
than the. salary! ot his. manager.. 
CC: each tuple in EMP where Depar taent: « toy’ a 
PR: Salary. <-Selary. of. the. sunth ere eee = Tenboer 
In constrained tuple 
The salary of | an employee cannot. decrenee, 
CC: each tuple in EMP 
PR: new Salary >= old Salary 


The average employee salary is at least equal to the salary 
of Robert Jones. 
CC: set of tuples in EPP 
PR: averagatSeisry) >= Salary. of wie share, Nae . 
"Jones, Fepert’ 


Each department hes at upst tno sais. ene @ salary of . 
more than $58,388. 
CC: eet; of. tuples in EP here Salary > 5800, grouped 
by common Department... : 
PR: count(Name) <= 2° - 


The number of: female: emp! oyees ig at least 401 of the tote! 
number of employees... - 

CC: set of tuples in Enp where Sex = ‘female’... 

PR: = counti{Name): >=» .4 * anor hd tapas | ate BP 


Employee names are unique. 
CC: set of tuples in EMP 
PRs multiset (Name) has no duplicates 
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Figuee 6-2. Local Tuple Predicates 


Types of Predicates (a): 


la. col scalarcomp const 
2a. col scalarcomp col 
3a. col scalarcomp colexpr 


4a. col setcomp {const-1, ..., conet-s} 

Sa. col setcomp. {cal-1, ..., col-n} 

6a. col setcomp {colexpr-l, ..., colemir-al 
7a. col setcomp setexpr 


8a. (col-1, eeeon col-n) setcomp Uconst-11, eoeg ‘conet-in), eneg 
(const-ml, ..., const<mn)} 

Sa. (col -i, oees col-n). setcomp Mcol-11,, ide colsini, eons 
{col-ml, ..., col-mn)} 

18a. (col-1, ..., col-n) setcomp ((colexpr-11, oor colexpr-in), re 
(colexpr-ml, ..., colexpr=mn)}-: 

lia. (col-1, ..., col-n) setcomp setexpr 


Definitions: 


col: column name with optional “old” or “new. 
(col-i, col-11, etc., are cole; all cols must 
reference entries within the constrained tuple) 

const: constant from an appropriate domain 

scalarop: +, =, ™, 7, wm, max, min, ett. , oF & user-defined 
scalar operator 


setop: ‘unton (also uritten as ()), intersection,’ dt f ference, 
or a user-defined set operator ; 

colexpr: a legal combination of col, renet. op, ane setop which 
yields a single value 

setexpr: same as colexpr except yields a set of values 

scalarcomp: =, wm, >, >=, <, <=, or a user-defined sual ar 
comparator  - 

setcomp: is in, contains, properly is in,- shonerig éontains, 


or a user-defined set comparator 
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Figure 6-3. Nonloca!l Tuple Predicates 


Types of Predicates (a): 
la. col scalarcomp scalarval 
2a. col setcomp setval 
3a. (col-1, ..., col-n) setcomp setval | 


(In type 2a setval is a set of values, and In type “3a netval 
is a set of tuples.) 


Definitions: . 
Definitions here are the same as figure 6-2, except: 


scalarval: a scalar value computed from the data base 
setval: @ set value computed from thé data base | 


NO predicates are the ease ag NI predicates, except that the 
process selecting scalarval and ete: may reference the entries 
in the constrained tuple. 
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Figure 6-4. Local Set Predicates 


Types of Predicates (a): 


la. aggfn(col) scalarcomp const 
2a. aggfn(col) scalarcomp aggfn(col) 
3a. aggfn{col) scalarcomp aggfnexpr 


4a. aggfnicol-1, ..., col-n) -acalarcomp ‘const 
Sa. aggfn(col-1, ..., col-n) scalarcomp aggfnicol=-1, ..., col-m) 
6a. aggfnfcol-1, ..., cal-n) scalarcomp aggfnexpr 


7a. set(col) setcomp {const-1, ..., const-n} 
8a. set{col) setcomp set(col) 
9a. set(col) setcomp set fnexpr 


18a. set(col-l, ..., cot=n) setcomp ((const-11, coe pana 1a: sees 
(const-ml, ..., const-mn)} 

lla. set(col-1, ...,-colan) satcomp {looi-k1,-..+, ‘col-in}, ..., 
(col-ml, ».+, col-mn)} 

12a. set{col-1, ..., col-n) setcomp set fnexpr a 


keebeniend 


13a. col crel col 

14a. col crel (col-1, ..., col-m) 

15a. (col-1, ..., col-n) crel col 

16a. (col-1, ..., col-n) crel (col-1, ..., col-m) 


Definitions: 


(col, const, scalarop, setop, colexpr, scalarcomp, setcomp are as 

in figure 6-2) 

aggfns set, max, min, avg, sum, count, or a user-defined 
aggregate function (also al! these with "’", e.g., 
"set’", meaning duplicates are retained) 

crel: one-to-one, functional ily-dependent, or a user-defined 
column relationship comparator 

aggfnexpr: a legal combination of aggfn, col, const, scalarop, setop, 
and colexpr 

setfnexpr: a legal combination of "set", col, const, scalarop, setop, 
and colexpr 


"Set" returns the set of values in a column (or tuples in a group 
of columns. It is an aggfn, but is also treated separately since 
it yields a set value. 

(Note that “max(set(Salary))" is equivalent to "max(Salary)".) 


Semantic Integrity Specification 


Figure 6-5. Nonlocal Set Predicates 


Types of Predicates (a): 


la. 
2a. 


3a. 
4a. 
{In 


aggfni{col) scalarcomp scalarval 
aggfn(col-1, ..., col-n) scalarcomp scalarval 


set(col) setcomp setval 
set(col-1, ..., col-n) setcomp setva! 


type 3a, setval is a set of scalars, and in type 4a, setval 


is a set of tuples.) 


Definitions: 


Definitions here are the same as figure 6-4, except: 


scalarval: a scalar value computed from the data base 
setval: a set value computed from the data base 


NO predicates are the same as NI predicates, except that the 


process selecting scalarval and setval may reference the data in 
the constrained tuple set. 
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